Skip to content

On-Device Processing

The SDK preprocesses video frames locally before uploading anything. This page covers what runs in the browser, what gets uploaded, and how fallbacks work.

The SDK performs preprocessing, not inference. Here is the division of work:

StageWhere
Camera accessBrowser
Face detection and trackingBrowser (WASM)
Skin region extraction and normalizationBrowser (WASM)
Quality checks (lighting, motion, pose)Browser
Frame encoding and uploadBrowser
Vital signs inferenceServer-side
Signal processing and calibrationServer-side

The SDK does not include any inference models. All signal processing and vital signs computation happens on the backend.

The SDK includes the Circadify Vision Engine — a set of WebAssembly modules that handle face detection, skin region extraction, and frame normalization. These modules are lazy-loaded from CDN on first use and cached by the browser for subsequent measurements.

ComponentSizeCaching
Face detection model~6 MBBrowser cache (persistent)
Geometry processor~6 MBBrowser cache (persistent)

For air-gapped or regulated environments where external CDN access is restricted, you can host the WASM files on your own infrastructure:

const sdk = new CircadifySDK({
apiKey: 'ck_live_your_key_here',
wasmConfig: {
mediapipeWasmUrl: 'https://cdn.yourcompany.com/circadify/wasm/',
mediapipeModelUrl: 'https://cdn.yourcompany.com/circadify/models/',
opencvUrl: 'https://cdn.yourcompany.com/circadify/geometry.js',
},
});

Contact support@circadify.com for the WASM distribution package.

For each captured frame, the SDK executes the following pipeline:

  1. Face detection — The Vision Engine locates the face in the video frame and tracks facial geometry in real time. If no face is detected for 30 seconds, the SDK throws a FACE_DETECTION_TIMEOUT error.

  2. Skin region extraction — Multiple skin regions are identified and isolated from the detected face. These regions are selected for their suitability for rPPG signal extraction, where blood flow changes produce subtle color variations.

  3. Normalization — Each extracted region is geometrically normalized to a consistent size and orientation, correcting for head movement and rotation.

  4. Encoding — Normalized regions are encoded into a compact binary format optimized for the backend inference model.

  5. Frame accumulation — Frames are captured at 30 FPS and accumulated over approximately 24 seconds of measurement. Capture stops when sufficient frames have been collected.

The entire pipeline runs in the browser. No raw video frames, face images, or identifiable data are included in the upload — only the preprocessed, normalized skin region data.

The SDK runs continuous quality checks during capture. All checks must pass before measurement begins, and warnings are emitted if quality degrades mid-capture.

Checks that the scene is well-lit and stable. Fails if the environment is too dark, too bright, or has flickering light sources.

Measures how much the user is moving. Fails if there is excessive head or body movement.

Checks head orientation. Fails if the user’s head is turned or tilted beyond the acceptable range.

Quality warnings are delivered via the onQualityWarning callback:

const sdk = new CircadifySDK({
apiKey: 'ck_test_your_key_here',
onQualityWarning: (warning) => {
// warning.type: 'lighting' | 'motion' | 'face_position'
// warning.severity: 'low' | 'medium' | 'high'
showToast(warning.message);
},
});

On-device processing requires the following browser capabilities:

RequirementPurpose
HTTPS (or localhost)Camera access
WebAssemblyVision Engine execution
Canvas 2DFrame processing
navigator.mediaDevicesCamera stream

Minimum browser versions:

BrowserMinimum Version
Chrome80+
Firefox75+
Safari14+
Edge80+
MetricTypical Value
Vision Engine load (first visit)2–5 seconds
Vision Engine load (cached)Under 100 ms
Capture duration~24 seconds
Upload size~45 MB
Upload time5–30 seconds (network dependent)
Backend inference60–90 seconds
Total end-to-end~2 minutes

If you choose not to use @circadify/sdk, you must implement the preprocessing pipeline yourself to produce the binary format the backend expects. This includes camera access, face detection, skin region extraction, normalization, and frame encoding.

Custom preprocessing implementations require access to our format specification. Contact support@circadify.com for documentation.

When backend inference fails, the SDK still returns a result:

  1. At session creation, the backend returns a fallback_config object with plausible vital sign ranges
  2. The SDK uses these ranges to generate synthetic values if the result endpoint returns an error
  3. Fallback results always have confidence: 0.0

Your application should always check the confidence score:

const result = await sdk.measureVitals({
container: document.getElementById('scan-container'),
});
if (result.confidence < 0.4) {
showWarning('Low confidence — results may be unreliable. Try again with better lighting.');
} else if (result.confidence === 0) {
showError('Measurement could not be completed. Please retry.');
}

A confidence score of 0.0 specifically indicates that fallback values were used and the result should not be treated as a real measurement.