Jack Pan

Jack PanJack Pan's blog — notes on data platforms, computer vision, and engineeringhttps://jackpan.me/Phase 1: Canonical filenames buy zero-config batchinghttps://jackpan.me/posts/canonical-filenames-zero-config-batching/https://jackpan.me/posts/canonical-filenames-zero-config-batching/A short naming convention like `NN_NNN_ego.mp4` encodes enough metadata to run the entire batch with one CLI flag. Why this is cheap to add, and what it forces you to *not* build.Mon, 11 May 2026 00:00:00 GMTPhase 1: Don't use Label Studio Source Storage for local fileshttps://jackpan.me/posts/dont-use-label-studio-source-storage/https://jackpan.me/posts/dont-use-label-studio-source-storage/A short warning. LS's "Cloud Storage → Source Storage" feature looks like exactly what you want for local data. Use it and you get tens of thousands of phantom tasks that collide with the ones you actually imported.Mon, 11 May 2026 00:00:00 GMTPhase 1: Four signals for review-frame samplinghttps://jackpan.me/posts/four-signal-frame-sampling/https://jackpan.me/posts/four-signal-frame-sampling/A uniform "every Nth frame" sampler wastes labeler time on near-identical frames the model already nailed. Four signals do better — segment boundaries, per-segment uniform, low confidence, bbox jumps.Mon, 11 May 2026 00:00:00 GMTPhase 1: Keep tests pure-Python with lazy importshttps://jackpan.me/posts/lazy-imports-pure-python-tests/https://jackpan.me/posts/lazy-imports-pure-python-tests/`mediapipe`, `ultralytics`, `cv2` are slow to import and need model weights at runtime. The trick that keeps the test suite small, fast, and weights-free is putting those imports inside function bodies, not at module top level.Mon, 11 May 2026 00:00:00 GMTPhase 1: One module owns every path on diskhttps://jackpan.me/posts/one-module-owns-every-path/https://jackpan.me/posts/one-module-owns-every-path/Why a video pre-annotation pipeline ends up with one `layout` module that knows where everything lives, and what breaks when six different parts of the codebase each compute paths their own way.Mon, 11 May 2026 00:00:00 GMTPhase 2: the eval set must never see pre-annotationshttps://jackpan.me/posts/phase-2-clean-eval-set/https://jackpan.me/posts/phase-2-clean-eval-set/The single decision that most human-in-the-loop projects get wrong. If your eval labels were seeded by the model's own predictions, every F1 number you ever report is biased toward the model. The fix is cheap on day one, expensive on day forty.Mon, 11 May 2026 00:00:00 GMTPhase 2: what counts as ground truthhttps://jackpan.me/posts/phase-2-ground-truth/https://jackpan.me/posts/phase-2-ground-truth/When you harvest Label Studio exports to fine-tune the next model, treating *everything* in the export as ground truth is how you train the model on its own predictions. The filters worth applying before a single byte goes into a training set.Mon, 11 May 2026 00:00:00 GMTPhase 2: don't retrain on every exporthttps://jackpan.me/posts/phase-2-retraining-cadence/https://jackpan.me/posts/phase-2-retraining-cadence/After every batch of human-corrected episodes gets exported from Label Studio, the temptation is to retrain immediately. The reasons not to, and a cheap cadence trigger that actually fires when retraining will help.Mon, 11 May 2026 00:00:00 GMTPhase 2: version the slice, not the snapshothttps://jackpan.me/posts/phase-2-version-the-slice/https://jackpan.me/posts/phase-2-version-the-slice/When you fine-tune model v3, you need to be able to answer "which exported corrections went into it". Snapshotting the whole training set is the obvious answer and the wrong one. Track the inputs and the derivation; the training set is a function of them.Mon, 11 May 2026 00:00:00 GMTPhase 1: Why one episode becomes three Label Studio projectshttps://jackpan.me/posts/three-label-studio-projects/https://jackpan.me/posts/three-label-studio-projects/A deep dive on the multi-project pattern for video pre-annotation — what forces the split, how one episode fans out, and when not to fight Label Studio's data model.Mon, 11 May 2026 00:00:00 GMTPhase 1: Two fps knobs in a video pre-annotation pipelinehttps://jackpan.me/posts/two-fps-knobs/https://jackpan.me/posts/two-fps-knobs/Inference frame rate and review-frame sampling look like one thing and aren't. What each knob actually buys, and what breaks if you treat them as the same.Mon, 11 May 2026 00:00:00 GMTPhase 1: Notes from building a video pre-annotation pipelinehttps://jackpan.me/posts/video-pre-annotation-pipeline/https://jackpan.me/posts/video-pre-annotation-pipeline/A Phase-1 pipeline for embodied-robot video data — MediaPipe + YOLO inference, action segmentation, Label Studio import — plus the boring path-abstraction decision that kept it from collapsing.Mon, 11 May 2026 00:00:00 GMTCIT CTF 2026 · Debug Disaster: a leaky debug page and a forgotten routehttps://jackpan.me/posts/cit-ctf-2026-debug-disaster/https://jackpan.me/posts/cit-ctf-2026-debug-disaster/Flask debug=True leaks more than tracebacks — it leaked the source code of a forgotten route that dumps .env in cleartext.Sun, 26 Apr 2026 00:00:00 GMTCIT CTF 2026 · A Massive Problem: mass assignment via dict.updatehttps://jackpan.me/posts/cit-ctf-2026-mass-assignment/https://jackpan.me/posts/cit-ctf-2026-mass-assignment/The challenge name spells it out. At register time, record.update(incoming) lets the role field in the request body overwrite the hard-coded default.Sun, 26 Apr 2026 00:00:00 GMTCIT CTF 2026: a few writeups worth keepinghttps://jackpan.me/posts/cit-ctf-2026-overview/https://jackpan.me/posts/cit-ctf-2026-overview/I played CIT CTF 2026 over the holiday — this is the index post for a short series of writeups covering Web, Crypto and Misc challenges.Sun, 26 Apr 2026 00:00:00 GMTCIT CTF 2026 · Baby Exponent: the most textbook RSA e=3https://jackpan.me/posts/cit-ctf-2026-baby-exponent/https://jackpan.me/posts/cit-ctf-2026-baby-exponent/Public exponent e=3, plaintext small enough that m³ never overflowed the modulus. Integer cube root and done.Sat, 25 Apr 2026 00:00:00 GMTCIT CTF 2026 · Dog Barking: three bark durations, one custom encodinghttps://jackpan.me/posts/cit-ctf-2026-dog-barking/https://jackpan.me/posts/cit-ctf-2026-dog-barking/78 seconds of dog barks. Three distinct bark durations encode bit 0, bit 1, and the byte separator. Not Morse — a custom code.Sat, 25 Apr 2026 00:00:00 GMTCIT CTF 2026 · Server Components: RCE via Next.js 15 RSC deserializationhttps://jackpan.me/posts/cit-ctf-2026-server-components/https://jackpan.me/posts/cit-ctf-2026-server-components/package.json pins next@15.0.4 — squarely inside the window for this year's React Server Components deserialization RCE.Sat, 25 Apr 2026 00:00:00 GMT