STN Speech-to-Note

STN is a local-first assistant for Pardus desktops that turns voice or typed requests into structured, stateful actions. Instead of switching between note apps, email, meeting tools, weather pages, alarms, and launchers, users describe what they need in one conversation while STN completes missing details, asks for approval when needed, and stores each result as a traceable local artifact.

Input Voice or text
Reasoning Intent + slots
Output Approved artifact
Platform Qt desktop

Product

Task execution from a single desktop conversation

STN solves the daily context-switching problem: users often move between mail clients, meeting tools, weather pages, alarm utilities, note apps, and launchers for small tasks. STN keeps this work inside one chat-based desktop interface while preserving local data ownership.

The end-product accepts speech or text, extracts the user's intent and required parameters, asks only for missing details, waits for approval when an action can affect external systems, and then executes the task through isolated tool integrations.

Voice and text input

Microphone capture, offline STT, and typed chat share the same orchestration pipeline.

Structured notes

Conversations and generated notes are stored locally with session context.

Meetings and email

Provider-backed meeting creation and SMTP email flows use explicit previews and approvals.

Weather

Natural language city and date requests are normalized before provider execution.

Alarms

Exact-time and relative reminders are mapped to local scheduling tools on supported systems.

App launching

Installed desktop applications can be opened safely through desktop-entry discovery.

Workflow

From natural language to verified action

STN is designed around a stateful execution loop. It does not discard incomplete commands; it tracks the active task and continues collecting only the missing fields.

01

Capture request

The Qt UI receives typed text or a transcript produced from microphone audio.

02

Extract intent

The orchestrator resolves the task type and converts natural language slots into strict JSON.

03

Complete parameters

If a field is missing, STN asks for that field and stores the pending state per chat.

04

Validate with agents

Deterministic agents normalize dates, addresses, providers, durations, and application names.

05

Execute through MCP

MCP tools validate inputs, call the local system or provider, log results, and return safe outputs.

Architecture

A modular stack built for local-first task automation

The implementation separates presentation, language understanding, task logic, tool execution, and persistence. This keeps the assistant extensible without letting one provider or one action dominate the whole system.

Input Desktop UI

Voice, typed chat, quick actions

Understanding Orchestration

Speech-to-text, intent detection, parameter completion

Task Logic Agents

Email, meeting, notes, weather, alarm, app launch

Execution MCP Tools

Local OS actions, provider calls, safe logging

Result Local State

Conversation memory, artifacts, user feedback

Orchestrator

Interprets the request, chooses the right task flow, tracks missing information, and handles approvals.

Agents

Apply deterministic task rules for email, meetings, notes, weather, alarms, and application launch.

MCP tool layer

Isolates real system and provider actions behind validated inputs, logs, timing, and safe error handling.

Local persistence

Keeps conversations, pending states, notes, meeting artifacts, email drafts, and tool activity on device.

Tech Stack

The implementation stack behind STN

STN combines a native Linux desktop interface, speech processing, language orchestration, modular tool execution, and local persistence into one maintainable product pipeline.

Core

Python

Main application logic, agent flows, integration code, and provider adapters.

Desktop

PySide / Pardus

Native Qt desktop UI designed and tested for Pardus/GNU Linux environments.

Input

Speech-to-Text

Microphone input is transcribed and passed into the same task pipeline as typed chat.

Reasoning

LLM Orchestration

Natural language requests are converted into intents, slots, follow-up questions, and approvals.

Execution

MCP Tools

System and provider actions are isolated behind validated tool interfaces and structured outputs.

State

Local Storage

Conversation history, artifacts, pending tasks, and user feedback stay on the device.

Providers

Email / Calendar / Weather

External service calls use previews, explicit confirmation where needed, and replaceable adapters.

Artifacts

Notes & Summaries

Requests, tool results, and meeting activity are turned into reusable notes and concise local summaries.

Demo

A two-minute path through the main product features

The demo focuses on visible end-product behavior: natural input, parameter completion, approval, and real execution through modular tools.

Screenshots

Product screens

Alarm scheduling and email approval flow
Meeting setup with required details
Created meeting link and artifact panel
Weather agent response
Local application launch results

Video

Product demo video

Engineering

Built as a real desktop product, not only a chatbot demo

Local-first runtime

Offline-capable speech processing, local SQLite state, and optional external providers.

Approval boundaries

Email and meeting actions present structured previews before execution.

Provider flexibility

Meeting, email, weather, STT, and LLM components are isolated behind replaceable interfaces.

Regression coverage

Unit, integration, voice E2E, provider-validation, and packaging checks cover the demo paths.

Team

Project members and contact

Şeyda Ertekin
Academic advisor Şeyda Ertekin
Mehmet Emin Fedar
External advisor Mehmet Emin Fedar, TUBITAK