CV Matcher Example

AI-powered CV / Job Description matching pipeline built with kdeps.

For each CV + JD pair the workflow:

Parses both documents (PDF, DOCX, JSON, TXT, or URL via scraper)
Extracts structured data and skills with an LLM
Indexes skill embeddings into a SQLite vector DB (cached — skipped if already known)
Scores the match by skill category (software_dev, platform, cloud, data, ml_ai, general)
If the overall score exceeds the threshold:
- Generates a motivation letter and a tailored CV
- Renders a full match-report PDF
- Uploads the PDF to S3 (presigned URL) and/or Google Drive
- Appends a row to an existing Google Sheet via a Python inline script
- Emails the distribution list with an HTML summary and the PDF attachment
Returns a structured JSON result via apiResponse

Prerequisites

Tool / Service	Required for
`wkhtmltopdf`	PDF generation (`generate-report-pdf` step)
Ollama or compatible LLM	Skill extraction, scoring, letter generation
SMTP server	Email distribution (`send-email` step)
Google Cloud service account	Google Sheets append (`append-sheet` step)
S3 presigned URL	S3 upload (`upload-s3` step, optional)
Google OAuth2 token	GDrive upload (`upload-gdrive` step, optional)

Install wkhtmltopdf:

# macOS
brew install wkhtmltopdf

# Debian / Ubuntu
apt install wkhtmltopdf

Configuration

Environment Variables

Variable	Description
`SMTP_HOST`	SMTP server hostname (e.g. `smtp.gmail.com`)
`SMTP_PORT`	SMTP port (default: `587`)
`SMTP_USERNAME`	SMTP authentication username
`SMTP_PASSWORD`	SMTP authentication password
`SMTP_FROM`	Sender address
`GOOGLE_APPLICATION_CREDENTIALS`	Path to Google Cloud service account JSON
`LLM_MODEL`	Model name (e.g. `llama3`, `claude-haiku`)

LLM backend

Set the model in settings.agentSettings inside workflow.yaml, or override it at request time via the llm_model body field.

Usage

Start the kdeps server:

kdeps run examples/cv-matcher/workflow.yaml

The API listens on port 16399 at POST /match.

Request body

{
  "cv_path": "/path/to/candidate.pdf",
  "cv_type": "pdf",
  "jd_path": "/path/to/job-description.pdf",
  "jd_type": "pdf",
  "distribution_list": ["hr@example.com", "hiring-manager@example.com"],
  "s3_presigned_url": "https://bucket.s3.amazonaws.com/upload?...",
  "gdrive_token": "ya29...",
  "gdrive_folder_id": "1AbCdEf...",
  "sheets_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms",
  "sheets_tab": "Matches"
}

Field	Type	Required	Description
`cv_path`	string	yes	Local file path or URL to the CV
`cv_type`	string	no	`pdf`, `docx`, `json`, `txt`, `url` (default: `pdf`)
`jd_path`	string	yes	Local file path or URL to the job description
`jd_type`	string	no	Same options as `cv_type`
`distribution_list`	[]string	yes	Email recipients for the match summary
`s3_presigned_url`	string	no	S3 PUT presigned URL for the match-report PDF
`gdrive_token`	string	no	Google OAuth2 bearer token for Drive upload
`gdrive_folder_id`	string	no	Google Drive folder ID to upload into
`sheets_id`	string	no	Google Spreadsheet ID to append a match row
`sheets_tab`	string	no	Sheet/tab name (default: `Matches`)

Example

curl -X POST http://localhost:16399/match \
  -H "Content-Type: application/json" \
  -d '{
    "cv_path": "/data/candidates/jane-smith.pdf",
    "jd_path": "/data/jobs/senior-backend-engineer.pdf",
    "distribution_list": ["hr@example.com", "cto@example.com"],
    "sheets_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms"
  }'

Response

{
  "candidate_name": "Jane Smith",
  "job_title": "Senior Backend Engineer",
  "overall_score": 0.87,
  "is_match": true,
  "category_scores": {
    "software_dev": 0.92,
    "platform": 0.85,
    "cloud": 0.80,
    "data": 0.75,
    "general": 0.60
  },
  "report_pdf": "/tmp/kdeps/cv-match-jane-smith-20260307.pdf",
  "gdrive_link": "https://drive.google.com/file/d/1AbCdEf.../view",
  "s3_link": "https://bucket.s3.amazonaws.com/reports/jane-smith.pdf",
  "email_sent": true,
  "sheet_row_appended": true
}

When is_match is false, the report_pdf, gdrive_link, s3_link, and email_sent fields are omitted and sheet_row_appended reflects whether the non-match was still recorded in the spreadsheet.

Pipeline Steps

Step	Resource type	Description
`scrape-cv`	`scraper`	Download / read the CV file
`scrape-jd`	`scraper`	Download / read the JD file
`extract-cv`	`chat`	LLM extracts name, work history, skills from CV
`extract-jd`	`chat`	LLM extracts required / preferred / nice-to-have skills from JD
`embed-cv-skills`	`embedding`	Index CV skills into SQLite vector DB
`embed-jd-skills`	`embedding`	Index JD skills into SQLite vector DB
`compute-match`	`chat`	Score the CV against the JD by skill category
`generate-letter`	`chat`	Write a personalised motivation letter (skipped when no match)
`generate-tailored-cv`	`chat`	Produce a CV tailored to the job description
`generate-report-pdf`	`pdf`	Render the full match report as a PDF
`upload-s3`	`httpClient`	PUT the PDF to an S3 presigned URL (optional)
`upload-gdrive`	`httpClient`	POST the PDF to Google Drive REST API (optional)
`append-sheet`	`python`	Append a result row to a Google Sheet
`send-email`	`email`	Email the distribution list with HTML summary + PDF
`api-response`	`apiResponse`	Return structured JSON result

Skill categories

Skills are classified into the following categories for scoring:

Category	Examples	Weight
`software_dev`	Python, Go, Java, C++, TypeScript	1.0
`platform`	Docker, Kubernetes, Terraform, Ansible	0.9
`cloud`	AWS, GCP, Azure, Cloudflare	0.9
`data`	SQL, Spark, Kafka, dbt, Airflow	0.85
`ml_ai`	PyTorch, TensorFlow, scikit-learn, LLMs	0.85
`security`	OWASP, penetration testing, SIEM	0.8
`general`	Jira, Confluence, MS Office, Slack	0.5

Match threshold: overall_score >= 0.65.

cv-matcher-online

Install

README