Scientific Sample Selection API — Reproducible RNG

What it is

A scientific-grade random sampler built on /api/ints (for index-based selection from a numbered frame) and /api/shuffle (for permutation-based selection without replacement). The seed and the algorithm are public, so any reviewer can replay the draw months or years later and confirm that the published sample matches the published methodology.

The pain point

"Random sampling" is one of the most-claimed and least-audited words in research methodology. Authors usually report "randomly selected N participants from the frame" with a function call to a local set.seed(42) — but the language version, the OS RNG, and the script may all be irreproducible months later. A verifiable sampling API lets a paper, an audit report, or a regulatory submission carry an exact, third-party replayable sampling trail.

Try it live — draw a sample of 10 from a frame of 5,000

curl "https://api.provable.io/api/ints?clientSeed=trial_acme_phase2_sample_2026_05_25&count=10&min=1&max=5000"

For sampling without replacement, shuffle the numbered frame and take the first N — equivalent to a Fisher-Yates simple random sample:

curl "https://api.provable.io/api/shuffle?clientSeed=audit_invoices_q2_2026_srs&items=INV-001,INV-002,...,INV-015"

Integration snippet

// Stratified random sample: draw n_h units from each stratum independently,
// each with its own seed for separable replay.
async function drawStratifiedSample({ studyId, strata }) {
  const out = {};
  for (const [stratum, { frame, n }] of Object.entries(strata)) {
    const url = new URL("https://api.provable.io/api/shuffle");
    url.searchParams.set("clientSeed", `${studyId}:${stratum}`);
    url.searchParams.set("items", frame.join(","));

    const res = await fetch(url, {
      headers: { "x-api-key": process.env.PROVABLE_KEY }
    });
    const { outcome, serverHash } = await res.json();
    out[stratum] = { selected: outcome.slice(0, n), serverHash };
  }
  return out;
}

const sample = await drawStratifiedSample({
  studyId: "trial_acme_phase2",
  strata: {
    site_us: { frame: usParticipantIds,  n: 50 },
    site_eu: { frame: euParticipantIds,  n: 30 },
    site_jp: { frame: jpParticipantIds,  n: 20 },
  },
});
// Publish: each stratum's serverHash, the published frame, the script.

Why this is reproducible

Public sampling frame. The frame (list of IDs in the population, or its size N) is published with the protocol. The sampling step is then a pure function of the published seed.
Cross-language replay. Because the byte stream is HMAC-SHA256 (not language-specific RNG state), any reviewer can re-derive the sample from a Python, R, Go, or browser implementation of the open-source provable-core library.
Pre-registration friendly. Commit the serverHash in your pre-registration; reveal the serverSeed in the published paper or audit report. Reviewers can re-derive the sample without re-running your entire pipeline.

Sampling designs that fit

Simple random sample (SRS). Shuffle the frame, take the first n.
Stratified random sample. One seeded shuffle per stratum, with a per-stratum n_h.
Cluster sample. One seeded pick over clusters, then SRS within selected clusters.
Systematic sample. Draw a single starting integer in [1, k], then take every k-th element.

Where it fits

Clinical trial randomization (allocation lists for treatment vs control).
Survey research drawing respondents from a frame.
Financial / compliance audits sampling transactions for inspection.
Election-day audits selecting precincts or ballots for hand-recount.
Quality assurance drawing units off a production line.