변경 로그

Version 0.2.2

누르기 딕테이션 모드, 핸즈프리 모드 및 하이브리드 단축키 관리자

누르기 딕테이션 모드:

누르면 녹음, 놓으면 전사: 키를 누르면 녹음을 시작하고 놓으면 전체 오디오를 전사하는 누르기 딕테이션 모드를 구현. 전체 오디오 전사: 놓을 때 녹음된 세그먼트 전체의 완전한 전사를 수행하여 의미론적 완전성과 컨텍스트 보존을 보장. 자동 텍스트 선택: 키를 놓을 때 선택된 텍스트를 자동으로 캡처하고 삽입하여 원활한 텍스트 교체 워크플로우를 가능하게 함. 최소 누름 지속 시간: 실수로 인한 활성화를 방지하기 위한 구성 가능한 최소 누름 지속 시간(기본값 150ms). 짧은 녹음의 오디오 패딩: 매우 짧은 오디오 세그먼트를 자동으로 패딩하여 정확한 전사를 보장하며, 전문 딕테이션 도구와 유사함.

핸즈프리 모드:

연속 듣기: 자동 세그멘테이션 및 실시간 전사가 있는 연속 오디오 모니터링을 활성화. 지능형 자동 세그멘테이션: 전사를 위해 음성을 의미 있는 청크로 세그먼트화하기 위해 500ms 침묵 기간을 자동으로 감지. 실시간 스트리밍 전사: 각 세그먼트가 완료되면 즉시 전사 결과를 제공하여 라이브 대화 캡처를 가능하게 함. 수정자 키 활성화: 추가 키 없이 핸즈프리 모드를 빠르게 활성화하기 위한 수정자 전용 키 조합(예: ⌘ + ⌥)을 지원.

하이브리드 단축키 관리자:

Fn 키 지원: 네이티브 CGEventTap 모니터링을 통해 Fn 키를 누르기 딕테이션 단축키로 사용하는 지원을 추가. 수정자 전용 조합: 추가 키를 요구하지 않고 핸즈프리 모드 활성화를 위한 수정자 전용 키 조합(예: ⌘ + ⌥)을 활성화.

모달 대화상자 상태 관리:

설정 모달 대화상자의 상태 관리를 개선하여 모든 구성 인터페이스에서 일관된 동작과 더 나은 사용자 경험을 보장.

Version 0.2.1

새 애플리케이션 아이콘, 스플래시 페이지 및 사전 기능

새 애플리케이션 아이콘:

시각적 아이덴티티 업데이트: 향상된 시각적 일관성을 가진 새로운 애플리케이션 아이콘 디자인을 구현. 브랜드 인식 향상: Dock, 메뉴 바 및 시스템 환경 설정을 포함한 모든 시스템 위치에서 아이콘을 업데이트.

스플래시 페이지 및 온보딩 투어:

첫 사용자 경험: 신규 사용자를 위한 스플래시 페이지와 포괄적인 온보딩 투어를 추가. 가이드 설정 프로세스: 주요 기능 및 기능성에 대한 단계별 소개. 사용자 온보딩 개선: 대화형 튜토리얼로 초기 사용자 경험을 향상.

사이드바의 사전 기능:

통합 사전 액세스: 어휘 및 스니펫 관리를 통합하는 사이드바에 사전 항목을 추가. 탭 인터페이스: 어휘와 스니펫 간의 쉬운 전환을 위해 사전 보기 내에 탭 인터페이스를 구현. 간소화된 탐색: 관련 기능을 그룹화하여 사이드바 탐색을 단순화.

플로팅 작업 버튼 (FAB) UI:

UI 전환: 향상된 접근성과 워크플로우를 위해 FAB 기반 사용자 인터페이스로 전환. 향상된 상호작용: FAB 디자인으로 사용자 상호작용 패턴을 개선.

고급 키보드 단축키 지원:

누르기 딕테이션용 Fn 키: Fn 키를 누르기 딕테이션 단축키로 사용하는 지원을 추가. 핸즈프리 모드용 수정자 키 조합: 핸즈프리 모드 활성화를 위해 수정자 전용 키 조합(예: ⌘ + ⌥) 지원을 구현. 유연한 단축키 구성: 단일 수정자 키와 복잡한 키 조합을 모두 지원하도록 단축키 시스템을 향상.

Version 0.2.0

실시간 전사, OOTB 음성 모델 및 성능 최적화

실시간 전사 지원:

HUD 패널에 실시간 전사 표시를 추가하여 생성되는 대로 실시간 전사 결과를 표시. 녹음 중 증분 전사 결과를 표시하는 스트리밍 전사 업데이트를 구현.

개봉 즉시 사용 가능 (OOTB) 음성 모델:

첫 실행 시 자동으로 사전 구성된 모델을 사용하는 기본 음성 모델 선택 시스템을 구현. 사용자는 모델 다운로드를 기다리지 않고 즉시 애플리케이션을 사용할 수 있어 원활한 첫 경험을 제공.

마이크 및 VAD 성능 최적화:

더 나은 리소스 관리 및 감소된 지연 시간으로 네이티브 오디오 캡처 성능을 개선. 더 빠른 전사 응답 시간을 위해 오디오 처리 지연 시간을 최소화.

사이드바 및 헤더 UI 최적화:

더 나은 시각적 계층 구조와 더 부드러운 접기/펼치기 애니메이션으로 사이드바 디자인을 개선. 빠른 액세스를 위해 통합된 마이크 장치 선택기 및 테마 전환기로 페이지 헤더를 향상. 애플리케이션 전체에서 더 나은 간격, 타이포그래피 및 시각적 일관성으로 UI 구성 요소를 개선.

Version 0.1.8

Audio Model Testing, HUD Enhancements & LLM Streaming

Audio Model Testing:

Added support to test audio models directly within the application. Users can now verify model performance and accuracy before using in production workflows.

HUD Panel Enhancements:

Added live audio waveform display in the HUD panel for visual feedback during recording. Implemented one-click copy functionality to quickly copy transcription content from the HUD panel. Enhanced HUD panel to show real-time transcription results directly in the panel interface.

LLM Streaming Support:

Added support to display LLM streaming responses in real-time. Users can now see LLM responses as they are generated, improving interaction feedback.

Manual Update Check:

Added manual update check functionality accessible from the sidebar. Users can now manually trigger update checks without waiting for automatic notifications.

Audio Device Detection Performance:

Improved performance and responsiveness of audio device detection. Reduced latency when scanning and listing available audio input devices. Optimized device detection to minimize system resource usage.

Microphone Settings Page Refactoring:

Refactored microphone settings page with better organization and user experience. Streamlined interface for selecting and configuring microphone devices. Improved visual design and information architecture for easier navigation.

Version 0.1.7

Google Sign-In & Always-On-Top HUD Panel

Google Account Sign-In Support:

Implemented Google OAuth authentication flow with secure code exchange for seamless account integration.

Always-On-Top HUD Panel:

Introduced a fully draggable HUD strip that can be repositioned anywhere on screen with elegant semi-transparent design that blends seamlessly with desktop content. The collapsible panel design features smooth expand/collapse animations, adjustable window sizes (Small, Medium, Large) for different use cases, and real-time opacity adjustment slider for customizing panel transparency. The non-activating design ensures the panel does not steal focus from other applications, maintaining workflow continuity.

HUD Light/Dark Theme Readability:

Comprehensive Light/Dark theme support for all HUD components ensuring optimal visibility and readability across different system themes.

Floating Action Button (FAB):

The Floating Action Button has been deprecated in favor of the new HUD panel. All FAB functionality has been integrated into the HUD panel with improved accessibility and features. The HUD panel provides a more native macOS experience with always-on-top capability and better visibility.

Version 0.1.6

Support Vocabulary & Recording Metadata Upgrades

Vocabulary & Misspelling Toolkit:

Customize domain-specific terminology lists and casing rules for precise transcriptions. Define common misspellings with automatic correction to reduce manual cleanup.

Recording History Metadata:

Capture and display sessionId, requestId, and the associated preset for faster troubleshooting. Include the new metadata fields in exports to support downstream analytics.

Version 0.1.5

User Profiles, Diagnostics & Header Experience

User Profile Management:

Introduced full profile display on the Profile page with name, gender, birth year, and profession details plus an editable form under Settings > Account.

User Avatar Enhancement:

Refined the avatar dropdown to highlight the full name with improved typography for clearer identity cues.

Profile Data Synchronization:

Added real-time loading and saving with consistent loading indicators and robust error handling.

Signed App Permission Validation:

Hardened notarized build entitlement checks with actionable error messaging and telemetry for signature failures.

Panic Hook Diagnostics:

Expanded the panic hook to capture structured stack traces and thread metadata while surfacing crash summaries and auto-restarting background workers.

Contextual Metadata Collection:

Gathered richer runtime context—including foreground app, OS build, and hardware model—to improve crash and feedback payload quality.

Header Layout Improvements:

Added a microphone selector and integrated theme switcher directly into the header for faster access.

Settings Page Refinements:

Added a birth-year dropdown, richer profession options, and unified loading indicators across Profile and Settings.

Version 0.1.4

Search Functionality & Data Synchronization Improvements

New Record Searchability:

Fixed issue where newly added voice records were not searchable due to missing user_id filtering in search queries.

User ID Synchronization:

Enhanced user_id parsing and storage from JWT access tokens to ensure proper record association.

Search Query Optimization:

Improved search logic to correctly handle records with NULL user_id values while maintaining backward compatibility.

Orphaned Record Cleanup:

Enhanced resync and reindex logic to automatically detect and remove orphaned database records that no longer have corresponding files.

Cross-User Cleanup:

Improved orphan cleanup to handle both authenticated and anonymous user records during synchronization.

User ID Recovery:

Added automatic user_id recovery for records that were incorrectly stored with NULL values by analyzing file system paths.

Path Management Refactoring:

Separated directory path retrieval from directory creation to prevent unintended directory creation during deletion operations.

Version 0.1.3

Enhanced User Experience & Advanced Search Capabilities

Processing Flow Visualization:

Added visual processing flow display when using modes, showing the complete pipeline from speech input to LLM processing or text output.

Enhanced Mode Editing:

Implemented comprehensive mode detail editing with click-to-edit functionality, allowing users to modify preset settings, voice models, LLM configurations, and advanced options.

Full-Text Search:

Implemented comprehensive full-text search across recording history, supporting search in transcriptions, titles, and LLM-generated content.

Performance Optimization:

Enhanced search performance with optimized query execution.

Advanced Filtering:

Added status-based filtering (completed, processing, error) with filter application.

History Re-indexing:

Added manual re-indexing functionality to rebuild search indexes and sync file system records with database.

Infinite Scroll:

Implemented seamless infinite scrolling for recording history with automatic pagination, reducing initial load time and improving user experience for large datasets.

Dock Icon:

Updated macOS Dock icon with new design and improved visual consistency.

System Tray Icon:

Enhanced system tray icon with better visibility and template support.

High-Resolution Assets:

Included @2x and @3x variants for Retina displays and various screen densities.

Version 0.1.2

Enhanced Infrastructure & System-wide Integration

Mirror Support:

Added alternative download mirrors for improved reliability and speed.

Faster Downloads:

Optimized download performance with multiple mirror sources.

Better Availability:

Reduced download failures with redundant mirror support.

Complete Model Catalog:

Full access to all CyberWhisper Cloud models through REST API.

Dynamic Model Loading:

Real-time model list fetching from CyberWhisper Cloud API.

Enhanced Model Selection:

Support for 6+ models including CyberWhisper Fast (ultra-fast response model for real-time conversations), CyberWhisper Flash (lightning-fast model for simple tasks), GPT-5 Nano (OpenAI's latest lightweight model with balanced performance), GPT-4o Mini (efficient OpenAI model for daily tasks), DeepSeek V3.1 (advanced reasoning capabilities with latest DeepSeek technology), and Gemini 2.5 Flash Lite (Google's ultra-fast lightweight model for real-time applications).

Mode Selection:

Quick mode switching directly from system tray.

Microphone Management:

Easy microphone device selection from tray menu.

System-wide Access:

Control CyberWhisper from anywhere in your system.

Quick Actions:

Essential functions accessible without opening the main window.

Version 0.1.1

Command Palette & Global Shortcuts

Smart Mode Search:

Search and activate modes using intelligent keyword matching.

Multi-criteria Filtering:

Find modes by preset names, voice models, LLM models, or descriptions.

Detailed Mode Information:

Display comprehensive mode details including preset, voice model, LLM model, input/output languages, and feature settings.

Global Shortcut Access:

Open Command Palette from anywhere with ⌘ + ⇧ + K.

Fuzzy Search:

Intelligent search that matches partial keywords and related terms.

Real-time Filtering:

Instant results as you type with live mode filtering.

Visual Mode Status:

Clear indication of active vs inactive modes with status badges.

Keyboard Navigation:

Full keyboard support with arrow keys and Enter to activate.

Added support for customizable global keyboard shortcuts.

Toggle recording with customizable shortcut (default: ⌥ + N).

Cancel Recording:

Cancel ongoing recording with Esc key.

Change Mode:

Quick mode switching with global shortcut (default: ⌘ + ⇧ + K). Shortcuts work system-wide, even when the app is not in focus. Fallback shortcut registration for better compatibility across different systems.

Version 0.1.0

Download Base Speech Models, Modes & Presets, and History Viewer

Download Base Speech Models

Support for downloading and running basic on-device speech-to-text models. This feature enables offline transcription capabilities and improved privacy for users who prefer to keep their audio data local.

Modes & Presets

Introduced Presets for Voice to Text and Message workflows, making it easier to configure transcription and usage modes. Users can now quickly switch between different processing modes without manual configuration.

History Viewer

Access and review your past transcriptions and interactions directly within the app. The History Viewer provides a comprehensive timeline of all your speech-to-text activities, making it easy to find and reference previous work.