Skip to content

Producer - Getting Started

The producer is a NestJS microservice using Kafka transport that discovers AA news metadata, queues downloads, fetches NewsML XML from the AA API (or from local mock data when credentials are not provided), parses NewsML into structured fields, persists raw and parsed data in Postgres (TypeORM), tracks short-lived state and run analytics in Redis, and publishes content messages onto Kafka topics consumed by downstream services.

Prerequisites

  • Postgres: the service persists metadata, raw XML, parsed content and run analytics in Postgres entities.
  • Redis: used for short-lived state (dedup keys, last-fetch timestamp) and transient analytics counters.
  • Kafka: used for triggers and for publishing downloaded/parsed content messages.

Configuration keys

The service reads configuration from environment variables (as defined in the repository's config schema). Key names and their defaults (from the code) include:

  • AA_API_URL (default: https://api.aa.com.tr/abone/)
  • AA_API_USER, AA_API_PASS (AA API credentials; optional)
  • KAFKA_BROKERS (default: kafka:9092)
  • REDIS_URL (default: redis://redis:6379)
  • DB_HOST, DB_PORT, DB_USER, DB_PASS, DB_NAME (Postgres connection keys; defaults defined in the config schema)
  • FETCH_CRON (cron expression; a default exists in the config schema)

If AA_API_USER / AA_API_PASS are not set the code falls back to a mock content path (the service uses a MockService implementation to generate sample items).

How the service is triggered

  • Scheduled runs: the repository includes a SchedulerService with a hardcoded @Cron("0 0 */1 * * *") decorator (every hour at :00). The FETCH_CRON environment variable is read from config and logged at startup, but it does not change the active schedule — the decorator expression is fixed in source. To change the schedule, the code must be modified.
  • Manual runs: the service also listens for a Kafka message on the fetch_content_aa topic; the controller consumes that message and calls the same FetchService.runFetch() code path.

What the producer publishes and consumes

  • The code declares and uses several Kafka topics. Notable topics: raw_content_aa_download (emitted when metadata needs to be downloaded), and raw_content_aa (emitted after a document is fetched, parsed and saved). The producer consumes raw_content_aa_download to perform downloads and consumes fetch_content_aa for manual triggers.

Note: topic declarations and topic creation are handled in the Kafka wrapper service at startup; some environments require explicit broker permissions for topic creation.

Data storage overview (conceptual)

  • Discovered items are persisted as metadata rows. Raw NewsML XML is persisted to a raw table while parsing occurs. Parsed content is persisted as a separate content row. Run/aggregate analytics are also persisted after each run.

Operational notes & troubleshooting

  • Rate limits: the AA API client throws a specific rate-limit error on 429 responses. The fetch workflow increments analytics counters and includes a short delay between requests — see the service constants in the code for exact behavior.
  • Deduplication: short-lived dedup keys are stored in Redis (the code prefixes keys with aan_source_ and uses a TTL). If duplicates are unexpectedly dropped, inspect Redis keys.
  • Cron vs config: the @Cron decorator in SchedulerService is hardcoded to "0 0 */1 * * *". The FETCH_CRON config value is only read for logging — it has no effect on when the scheduler fires.
  • Topic creation: the Kafka wrapper attempts to create topics via the Kafka admin client; if your broker denies topic creation, create topics beforehand or adjust broker permissions.