# Recommended Strategy

## Recommended Strategy

Fantastic.jobs is geared towards platforms that ingest new jobs on a recurring basis. We recommend starting with the `active-ats` or `active-jb` endpoints with a `1h` or `24h` `time_frame`.

- For the `1h` `time_frame`, make sure to make your requests every hour.
- For the `24h` `time_frame`, make sure to make your request during the same hour every day. For example, every day between 2 and 3 AM.

This way you will never receive duplicate jobs in your database.

### Good to know

The `1h` and `24h` `time_frame` parameters serve jobs with a one hour delay for enrichments (UTC):

- **`1h`** - If you call the endpoint at 09:15 AM you will receive all jobs indexed between 07:00 AM and 08:00 AM.
- **`24h`** - If you call the endpoint at 2026-01-02 at 09:15 AM you will receive all jobs indexed between 2026-01-01 07:00 AM and 2026-01-02 08:00 AM.

### description_format

The next parameter to set is the `description_format`. The two options are `text` and `html`, where `text` is a clean text version of the description and `html` is the raw original HTML description.
If you don't set this parameter the API will not return a description field.

### Pagination

To retrieve all the jobs for your `time_frame`, you might need to do multiple requests while increasing the `offset`.

First, set a preferred `limit` between 100 and 1,000. This is the number of jobs returned for each API request. If the number of jobs returned is less than the `limit`, do another request while increasing `offset` by the `limit`. Keep making requests until the API returns less jobs than the `limit`, implying you have retrieved all jobs for this `time_frame`.

For example:

```
request 1: limit=1000&offset=0
request 2: limit=1000&offset=1000
request 3: limit=1000&offset=2000
```

### Other key parameters

**`include_basic_organization_details`**

Set this parameter to `true` to include organization related fields in your API requests. If you're already using `organization_advanced` then this data is duplicative and no need to include.

**`location`**

Filter the results on specific locations using natural language:

```
location="United States" OR Canada OR "United Kingdom"
```

For more details on location search, please refer to our guide.

**`title`**

Filter the results on specific job titles using natural language:

```
title="Software Engineer" OR "Data Scientist"
```

## Expired jobs
After setting up the retrieval of active jobs we recommend configuring the expired jobs endpoints on the same day. This way you can start removing expired jobs from day 1.
We recommend using the '1h' endpoint if possible, reducing the delta between when a job is flagged is our system and removed on your end.

The `expired-ats` and `expired-jb` endpoints simply return arrays of internal job IDs that have expired in the selected window. Match each ID against your stored `id` values to invalidate stale rows.

- Use `time_frame=1h` for hourly polling (rolling 1-hour window, refreshed continuously).
- Use `time_frame=1d` for daily syncs (snapshot of the previous UTC day, refreshed once per day at 01:00 UTC; not a rolling 24-hour window).
- Avoid `time_frame=6m` for routine polling - returns tens of millions of IDs and should only be used in case of an outage.

> **Coverage caveat for `expired-jb`:** only LinkedIn listings (`source=linkedin`) are re-checked for expiration. `wellfound` and `ycombinator` listings will never show up in the expired feed, so if you ingest them you'll need to implement your own freshness check (e.g. drop anything older than N days). ATS expiry coverage is unaffected - every ATS source is re-checked.

## Backfill

When you first integrate - or whenever you need to seed your database with historical jobs that pre-date your hourly/daily sync - use the `7d` and `6m` `time_frame`s on `active-ats` and `active-jb`.

- **`7d`** - all jobs indexed in the last 7 days. Use this if you only need the most recent week of history.
- **`6m`** - all jobs indexed in the last 6 months. This is the deepest backfill window we offer.

Both windows refresh every 1 to 3 minutes with a roughly 45 minute ingestion delay (vs. the 1 hour delay on `1h`/`24h`).

Use `date_posted_gte`/`date_posted_lt` (or `date_created_gte`/`date_created_lt`) to slice the window into smaller chunks if you'd rather ingest progressively.  

### Pagination

`7d` works fine with `limit` + `offset`, identical to the strategy described above for hourly/daily syncs.

`6m` returns far too many rows for offset-based pagination to be efficient at deep pages. We strongly recommend using `cursor` instead.

### `limit` + `cursor`

`cursor` works as follows:

- Set `cursor` to the last `id` returned by the previous batch.
- The result ordering switches from `date_posted` descending to `id` ascending - this is what makes cursor pagination stable across pages.
- Keep going until the API returns fewer rows than `limit`.

Example sequence:

```
request 1: limit=200&cursor=1
request 2: limit=200&cursor=<last id from request 1>
request 3: limit=200&cursor=<last id from request 2>
...
```

If both `cursor` and `offset` are passed, `cursor` wins and `offset` is ignored - pick one and stick with it.

### Caveats

- For `active-jb`, description search (`description`, `description_advanced`) is **not** supported on `time_frame=6m` and will return a 400.
- For `active-ats`, description search on `6m` is allowed but can be slow - shrink the window or simplify the query if you hit timeouts.

## Modified Jobs

`modified-ats` is the companion endpoint to `active-ats` that lets you keep an existing copy of the dataset in sync without re-fetching the full feed.

- **ATS only.** There is no `modified-jb` - LinkedIn and the other JB sources are not re-checked for field-level modifications.
- Contains every ATS job whose tracked fields (title, description, location, salary, apply URL, …) changed in the last 24 hours.
- Typically returns between 100,000 and 150,000 jobs per day.

### How to call it

We recommend calling `modified-ats` **once per day at the same time**. As long as you stick to a fixed cadence you will never receive duplicate modifications between runs. Pagination is identical to `active-ats` - use `limit` + `offset` and keep going until the API returns fewer rows than `limit`. `cursor` is also available if you prefer it, but note the ordering caveat below.

### Two extra response fields

In addition to the full `ActiveAtsJob` shape, every row also includes:

- **`date_modified`** - the UTC timestamp when we detected the change. Default sort order is `date_modified` descending. **Caveat:** if you paginate with `cursor` instead of `offset`, results are ordered by `id` ascending instead - the same ordering switch as on the active endpoints. Pick one strategy and stick with it.
- **`modified_fields`** - a string array listing exactly which fields changed (e.g. `["title", "ai_salary_min_value"]`). Use this to apply targeted updates instead of overwriting whole rows.

### Posted-date modifications

Some ATS bump the job's `date_posted` very frequently to keep it looking "fresh". To avoid drowning your sync in cosmetic changes, we only treat a posted-date change as a modification if the new `date_posted` is older than 14 days. Modifications to other fields are always reported.

## Retrieving jobs after technical issues

If your ingestion pipeline goes down for longer than the `1h` / `24h` window covers, recover the missed jobs from the `7d` or `6m` backfill endpoints.

The trick is to use **`date_created_gte`** (rather than `date_posted_gte`) to define the recovery window. `date_created` reflects when our system first indexed the job - it is always populated and monotonically increasing, so it's the only field that lets you precisely express "everything indexed since my last successful run" without missing edge cases. See [Time Fields](/documentation/time-fields) for the full distinction.

### Recipe

1. Look up the most recent `date_created` you successfully ingested - call this `T_last`.
2. Pick the smallest `time_frame` that fully covers the gap:
   - Outage shorter than 7 days → `time_frame=7d`.
   - Outage longer than that → `time_frame=6m`.
3. Add `date_created_gte=T_last` to bound the lower edge of the recovery query.
4. Optionally add `date_created_lt=<cutoff>` so the recovery run hands off cleanly to your normal hourly/daily sync.
5. Paginate to completion - `offset` on `7d`, `cursor` on `6m`.
6. Resume your usual `1h` / `24h` polling.

This guarantees you replay exactly the jobs you missed - no gaps, no duplicates.

### Example: 2-hour outage

Say you run an hourly sync against `active-ats` and the last call that successfully completed pulled jobs indexed up to **2026-04-27 14:00 UTC**. Your pipeline then breaks and you don't recover until **16:30 UTC** - a roughly 2-hour gap covering the 14:00 and 15:00 hourly windows.

Once you're back up, fire a single recovery request against the `7d` window, scoped to exactly the missing slice:

```
GET /v0.9/active-ats
  ?time_frame=7d
  &date_created_gte=2026-04-27T14:00:00
  &date_created_lt=2026-04-27T16:00:00
  &limit=1000
  &offset=0
```

Paginate by bumping `offset` until the response returns fewer rows than `limit`. After that completes, resume your usual hourly polling at **17:00 UTC** as if nothing happened - the recovery window ends exactly where the next hourly call begins, so no jobs are dropped or duplicated.

