keinplankarriere
Technical deep-dive

How the matching
engine works

Fuzzy deduplication, skill & experience extraction, and hybrid scoring — under the hood.
01
Deduplication
02
Skill extraction
03
Experience extraction
04
Hybrid scoring
~5 minute walkthrough
keinplankarriere
Where everything fits

The pipeline

Ingestion — per scraped job
Scrape
4 boards, with full descriptions
Extract skills
taxonomy + regex
Deduplicate
fuzzy match across sources
Store
SQLite — one row per job
Matching — scoring pass
Candidate
experience base → skills & preferences
+
Job
title · description · extracted skills
Rule score
0–100, instant & free
LLM refine
top-N only, + explanation
keinplankarriere
1Fuzzy deduplication

Same job, four boards

The problem. One role gets posted on LinkedIn StepStone Xing Arbeitsagentur — four near-identical rows.
Normalize first. lower-case; strip gender markers (m/w/d), company suffixes (GmbH, AG, SE) and punctuation.
Compare with difflib. SequenceMatcher ratio on title + company — Python standard library, zero dependencies, fully deterministic.
# on normalized title + company
title_sim   = ratio(a.title,   b.title)
company_sim = ratio(a.company, b.company)

is_dup = title_sim   >= 0.82 and
         company_sim >= 0.80

Confidence = 0.6 · title + 0.4 · company. Only cross-source pairs merge into one row (recorded in also_on); a board updating its own posting is handled by the job-id upsert.

keinplankarriere
2Skill extraction

From description to tags

Curated taxonomy. ~70 canonical skills, each with aliases — React / ReactJS / React.js all map to “React”. English + German.
Word-boundary regex. one pre-compiled pattern per skill, so “Java” ≠ “JavaScript” and “Go” ≠ “Google”.
Runs at ingestion. scans the title + the full scraped description, then normalizes the result onto the job.
“…REST APIs in Python and Django, deployed on AWS with Docker…”
↓ extract
PythonDjangoREST AWSDocker
keinplankarriere
3Experience extraction

Your CV → structured experience

01 · Input
CV PDF (pypdf) or pasted LaTeX / text
02 · LLM parse
→ JSON: title, org, stack, dates, summary, tags
03 · Review
human-in-the-loop popup before save
04 · Store
experience base drives matching

Two ways in

  • Upload a CV → AI parses every role & project
  • Add manually → AI infers the type, tags & stack
  • You review and edit before anything is saved

Made robust

  • Non-reasoning model — emits JSON, not “thinking”
  • Tolerant JSON extraction + schema validation
  • Self-healing: auto-picks an available model
keinplankarriere
4Hybrid scoring

Deterministic first, LLM where it counts

Layer 1 — rule score · every job · instant · free
Skills
45
Role
20
Location
10
Remote
10
Seniority
10
Salary
5

Skills = the share of the job’s requirements the candidate has → more experience only ever helps.

Layer 2 — LLM refinement

  • only the top-N rule candidates
  • re-scored 0–100, grounded in the real experiences
  • returns a written explanation
  • rate-limited (1.5 s + 429 back-off)
  • falls back to the rule score on any failure
Plus a separate call that ranks which experiences to emphasize per job.
keinplankarriere
Design principles

What ties it together

Deterministic first

Dedup, skills and the base score need no LLM — fast, free, reproducible.

LLM where it judges

Reserved for parsing CVs, refining the top matches and explaining them.

Always grounded

Scores and CVs come from real, reviewed experience — no fabrication.

Resilient by design

Standard-library core, fallbacks everywhere, self-healing model choice.

1 / 7
→ / space to advance · F for fullscreen