変更履歴

Name: CyberWhisper
Author: CyberWhisper

Version 0.4.0

バグ修正、UI 改善とインフラストラクチャ移行

全文検索の問題：

全文検索機能の問題を修正し、正確で信頼性の高い検索結果を確保しました。

拡張画面表示の問題：

拡張画面構成を使用する際の表示と配置の問題を修正しました。複数ディスプレイ環境でのウィンドウの動作と配置を改善しました。

エラーメッセージの表示：

エラーメッセージの表示とフォーマットの問題を修正しました。エラーメッセージの明確さとユーザーフィードバックメカニズムを改善しました。

UI スタイリングの調整：

一貫したブランド体験のために、アプリケーションのスタイリングをウェブサイトデザインに合わせて更新しました。すべてのアプリケーションインターフェース全体で視覚的一貫性を改善しました。

辞書とスニペットの検索：

検索の使いやすさを向上させるため、語彙とスニペットの大文字小文字を区別しない検索を実装しました。ユーザーは大文字小文字に関係なくエントリを見つけることができるようになりました。

インフラストラクチャ移行：

パフォーマンスと信頼性を向上させるため、新しいインフラストラクチャに移行しました。

Version 0.3.1

システムトレイアイコン、オンボーディングと辞書同期

システムトレイモードアイコン：

システムトレイの「モード選択」メニューで各モードにプリセットアイコン（絵文字）を追加し、視覚的な識別を容易にしました。すべてのプリセットタイプ（voice_to_text、message、note、mail、vibe_coding、meeting、custom）の包括的なアイコンマッピングを実装しました。アイコンはモードリストで使用されるプリセットアイコンと一致し、一貫したユーザー体験を提供します。

強化されたオンボーディング体験：

最初のモードを作成するための専用ステップをオンボーディングフローに追加しました。キーボードショートカットの設定をオンボーディングプロセスに直接統合しました。初回ユーザー体験中に基本的な設定を行うための段階的なガイダンスを提供します。

辞書スニペットサーバー同期：

辞書スニペットのサーバー側同期を実装しました。スニペットはバックアップとクロスデバイスアクセスのためにサーバーに自動的に同期されるようになりました。同期はユーザーのワークフローを中断することなく、バックグラウンドでシームレスに動作します。

高度な設定の拡張：

パワーユーザー向けにより細かい高度な設定を追加しました。追加のカスタマイズオプションで設定インターフェースを拡張しました。設定の分類と整理を改善し、ナビゲーションを容易にしました。

モードリストのユーザー体験：

より良い視覚的インジケーターとステータスフィードバックで視覚的フィードバックを改善しました。より直感的なコントロールとより良いエラーハンドリングでモード編集インターフェースを合理化しました。テーマに適したアイコンファイルを使用することで、ダークモードでの CyberWhisper プロバイダーアイコンの可視性を修正しました。すべてのプロバイダーアイコンがライトテーマとダークテーマで正しく表示されることを確認しました。

モード編集ワークフロー：

より良い検証とユーザーフィードバックでモード編集ワークフローを改善しました。モード管理インターフェース全体でプリセットアイコンの統合を改善しました。モード設定中のエラーハンドリングとユーザーメッセージングを強化しました。

Version 0.3.0

メッセージとノートプリセット、BYOK LLM と設定 UI リファクタリング

メッセージとノートプリセット：

転写を簡潔なチャットメッセージに変換し、トーンカスタマイズオプションを提供するメッセージプリセットを追加しました。転写を主要ポイント抽出付きの構造化ノートに要約するノートプリセットを実装しました。

フロンティア LLM の Bring Your Own Key (BYOK)：

フロンティア LLM モデルを使用するために独自の API キーを持参するサポートを実装しました。さまざまなプロバイダーからの OpenAI 互換 API エンドポイントのサポートを追加しました。カスタム LLM プロバイダーを追加、設定、管理するための包括的なプロバイダー管理インターフェース。接続の問題に関する詳細なエラーメッセージを含む自動 API キー検証。各カスタムプロバイダー向けの柔軟なモデル選択と設定。OpenAI、DeepSeek、Groq、Together AI、Perplexity、Longcat を含む複数の LLM プロバイダーのサポート。

設定 UI リファクタリング：

ユーザー体験を改善するため、設定インターフェースをモーダルウィンドウとしてリファクタリングしました。メインインターフェースから離れることなく設定にアクセスして設定できます。モーダルをすばやく閉じるための ESC キーサポートを追加しました。より良い組織と視覚的階層で設定レイアウトを強化しました。プロフェッショナルなモーダルプレゼンテーションのためのスムーズな背景ぼかしとフェードアニメーションを実装しました。

Version 0.2.2

プッシュトゥディクテーションモード、ハンズフリーモード、ハイブリッドショートカットマネージャー

プッシュトゥディクテーションモード：

プレスで録音、リリースで転写：キーを押すと録音を開始し、離すと完全な音声を転写するプッシュトゥディクテーションモードを実装。完全音声転写：離すと録音されたセグメント全体の完全な転写を実行し、意味の完全性とコンテキストの保持を確保。自動テキスト選択：キーを離すときに選択されたテキストを自動的にキャプチャして挿入し、シームレスなテキスト置換ワークフローを可能に。最小プレス時間：誤動作を防ぐための設定可能な最小プレス時間（デフォルト 150ms）。短い録音の音声パディング：非常に短い音声セグメントを自動的にパディングして、正確な転写を確保。プロフェッショナルなディクテーションツールと同様。

ハンズフリーモード：

継続的なリスニング：自動セグメンテーションとリアルタイム転写を備えた継続的な音声監視を有効化。インテリジェントな自動セグメンテーション：500ms の無音期間を自動検出して、音声を転写のための意味のあるチャンクにセグメント化。リアルタイムストリーミング転写：各セグメントが完了すると即座に転写結果を提供し、ライブ会話のキャプチャを可能に。修飾キーアクティベーション：追加のキーなしでハンズフリーモードをすばやくアクティブ化するための修飾キーのみの組み合わせ（例：⌘ + ⌥）をサポート。

ハイブリッドショートカットマネージャー：

Fn キーサポート：ネイティブ CGEventTap モニタリングを通じて、Fn キーをプッシュトゥディクテーションショートカットとして使用するサポートを追加。修飾キーのみの組み合わせ：追加のキーを必要とせずにハンズフリーモードをアクティブ化するための修飾キーのみの組み合わせ（例：⌘ + ⌥）を有効化。

モーダルダイアログの状態管理：

設定モーダルダイアログの状態管理を改善し、すべての設定インターフェースで一貫した動作とより良いユーザー体験を確保。

Version 0.2.1

新しいアプリケーションアイコン、スプラッシュページ、辞書機能

新しいアプリケーションアイコン：

視覚的アイデンティティの更新：視覚的一貫性を向上させた新しいアプリケーションアイコンデザインを実装。ブランド認識の強化：Dock、メニューバー、システム環境設定を含むすべてのシステム位置でアイコンを更新。

スプラッシュページとオンボーディングツアー：

初回ユーザー体験：新規ユーザー向けにスプラッシュページと包括的なオンボーディングツアーを追加。ガイド付きセットアッププロセス：主要な機能と機能性への段階的な紹介。ユーザーオンボーディングの改善：インタラクティブなチュートリアルで初期ユーザー体験を強化。

サイドバーの辞書機能：

統一された辞書アクセス：語彙とスニペット管理を統合するサイドバーに辞書エントリを追加。タブ付きインターフェース：語彙とスニペット間の簡単な切り替えのために、辞書ビュー内にタブ付きインターフェースを実装。合理化されたナビゲーション：関連機能をグループ化することで、サイドバーナビゲーションを簡素化。

フローティングアクションボタン (FAB) UI：

UI 移行：アクセシビリティとワークフローを改善するために、FAB ベースのユーザーインターフェースに切り替え。インタラクションの強化：FAB デザインでユーザーインタラクションパターンを改善。

高度なキーボードショートカットサポート：

プッシュトゥディクテーション用の Fn キー：Fn キーをプッシュトゥディクテーションショートカットとして使用するサポートを追加。ハンズフリーモード用の修飾キー組み合わせ：ハンズフリーモードのアクティベーションのために、修飾キーのみの組み合わせ（例：⌘ + ⌥）のサポートを実装。柔軟なショートカット設定：単一の修飾キーと複雑なキー組み合わせの両方をサポートするようにショートカットシステムを強化。

Version 0.2.0

ライブ転写、OOTB 音声モデル、パフォーマンス最適化

ライブ転写サポート：

HUD パネルにライブ転写表示を追加し、生成時にリアルタイム転写結果を表示。録音中に増分転写結果を表示するストリーミング転写更新を実装。

開封即用 (OOTB) 音声モデル：

初回起動時に自動的に事前設定されたモデルを使用するデフォルト音声モデル選択システムを実装。ユーザーはモデルのダウンロードを待つことなく、アプリケーションをすぐに使用開始でき、シームレスな初回体験を提供。

マイクと VAD パフォーマンス最適化：

より良いリソース管理と低遅延により、ネイティブオーディオキャプチャのパフォーマンスを改善。より高速な転写応答時間のために、オーディオ処理の遅延を最小化。

サイドバーとヘッダー UI 最適化：

より良い視覚的階層とよりスムーズな折りたたみ/展開アニメーションでサイドバー設計を改善。クイックアクセスのために統合されたマイクデバイスセレクターとテーマスイッチャーでページヘッダーを強化。アプリケーション全体でより良い間隔、タイポグラフィ、視覚的一貫性で UI コンポーネントを洗練。

Version 0.1.8

Audio Model Testing, HUD Enhancements & LLM Streaming

Audio Model Testing:

Added support to test audio models directly within the application. Users can now verify model performance and accuracy before using in production workflows.

HUD Panel Enhancements:

Added live audio waveform display in the HUD panel for visual feedback during recording. Implemented one-click copy functionality to quickly copy transcription content from the HUD panel. Enhanced HUD panel to show real-time transcription results directly in the panel interface.

LLM Streaming Support:

Added support to display LLM streaming responses in real-time. Users can now see LLM responses as they are generated, improving interaction feedback.

Manual Update Check:

Added manual update check functionality accessible from the sidebar. Users can now manually trigger update checks without waiting for automatic notifications.

Audio Device Detection Performance:

Improved performance and responsiveness of audio device detection. Reduced latency when scanning and listing available audio input devices. Optimized device detection to minimize system resource usage.

Microphone Settings Page Refactoring:

Refactored microphone settings page with better organization and user experience. Streamlined interface for selecting and configuring microphone devices. Improved visual design and information architecture for easier navigation.

Version 0.1.7

Google Sign-In & Always-On-Top HUD Panel

Google Account Sign-In Support:

Implemented Google OAuth authentication flow with secure code exchange for seamless account integration.

Always-On-Top HUD Panel:

Introduced a fully draggable HUD strip that can be repositioned anywhere on screen with elegant semi-transparent design that blends seamlessly with desktop content. The collapsible panel design features smooth expand/collapse animations, adjustable window sizes (Small, Medium, Large) for different use cases, and real-time opacity adjustment slider for customizing panel transparency. The non-activating design ensures the panel does not steal focus from other applications, maintaining workflow continuity.

HUD Light/Dark Theme Readability:

Comprehensive Light/Dark theme support for all HUD components ensuring optimal visibility and readability across different system themes.

Floating Action Button (FAB):

The Floating Action Button has been deprecated in favor of the new HUD panel. All FAB functionality has been integrated into the HUD panel with improved accessibility and features. The HUD panel provides a more native macOS experience with always-on-top capability and better visibility.

Version 0.1.6

Support Vocabulary & Recording Metadata Upgrades

Vocabulary & Misspelling Toolkit:

Customize domain-specific terminology lists and casing rules for precise transcriptions. Define common misspellings with automatic correction to reduce manual cleanup.

Recording History Metadata:

Capture and display sessionId, requestId, and the associated preset for faster troubleshooting. Include the new metadata fields in exports to support downstream analytics.

Version 0.1.5

User Profiles, Diagnostics & Header Experience

User Profile Management:

Introduced full profile display on the Profile page with name, gender, birth year, and profession details plus an editable form under Settings > Account.

User Avatar Enhancement:

Refined the avatar dropdown to highlight the full name with improved typography for clearer identity cues.

Profile Data Synchronization:

Added real-time loading and saving with consistent loading indicators and robust error handling.

Signed App Permission Validation:

Hardened notarized build entitlement checks with actionable error messaging and telemetry for signature failures.

Panic Hook Diagnostics:

Expanded the panic hook to capture structured stack traces and thread metadata while surfacing crash summaries and auto-restarting background workers.

Contextual Metadata Collection:

Gathered richer runtime context—including foreground app, OS build, and hardware model—to improve crash and feedback payload quality.

Header Layout Improvements:

Added a microphone selector and integrated theme switcher directly into the header for faster access.

Settings Page Refinements:

Added a birth-year dropdown, richer profession options, and unified loading indicators across Profile and Settings.

Version 0.1.4

Search Functionality & Data Synchronization Improvements

New Record Searchability:

Fixed issue where newly added voice records were not searchable due to missing user_id filtering in search queries.

User ID Synchronization:

Enhanced user_id parsing and storage from JWT access tokens to ensure proper record association.

Search Query Optimization:

Improved search logic to correctly handle records with NULL user_id values while maintaining backward compatibility.

Orphaned Record Cleanup:

Enhanced resync and reindex logic to automatically detect and remove orphaned database records that no longer have corresponding files.

Cross-User Cleanup:

Improved orphan cleanup to handle both authenticated and anonymous user records during synchronization.

User ID Recovery:

Added automatic user_id recovery for records that were incorrectly stored with NULL values by analyzing file system paths.

Path Management Refactoring:

Separated directory path retrieval from directory creation to prevent unintended directory creation during deletion operations.

Version 0.1.3

Enhanced User Experience & Advanced Search Capabilities

Processing Flow Visualization:

Added visual processing flow display when using modes, showing the complete pipeline from speech input to LLM processing or text output.

Enhanced Mode Editing:

Implemented comprehensive mode detail editing with click-to-edit functionality, allowing users to modify preset settings, voice models, LLM configurations, and advanced options.

Full-Text Search:

Implemented comprehensive full-text search across recording history, supporting search in transcriptions, titles, and LLM-generated content.

Performance Optimization:

Enhanced search performance with optimized query execution.

Advanced Filtering:

Added status-based filtering (completed, processing, error) with filter application.

History Re-indexing:

Added manual re-indexing functionality to rebuild search indexes and sync file system records with database.

Infinite Scroll:

Implemented seamless infinite scrolling for recording history with automatic pagination, reducing initial load time and improving user experience for large datasets.

Dock Icon:

Updated macOS Dock icon with new design and improved visual consistency.

System Tray Icon:

Enhanced system tray icon with better visibility and template support.

High-Resolution Assets:

Included @2x and @3x variants for Retina displays and various screen densities.

Version 0.1.2

Enhanced Infrastructure & System-wide Integration

Mirror Support:

Added alternative download mirrors for improved reliability and speed.

Faster Downloads:

Optimized download performance with multiple mirror sources.

Better Availability:

Reduced download failures with redundant mirror support.

Complete Model Catalog:

Full access to all CyberWhisper Cloud models through REST API.

Dynamic Model Loading:

Real-time model list fetching from CyberWhisper Cloud API.

Enhanced Model Selection:

Support for 6+ models including CyberWhisper Fast (ultra-fast response model for real-time conversations), CyberWhisper Flash (lightning-fast model for simple tasks), GPT-5 Nano (OpenAI's latest lightweight model with balanced performance), GPT-4o Mini (efficient OpenAI model for daily tasks), DeepSeek V3.1 (advanced reasoning capabilities with latest DeepSeek technology), and Gemini 2.5 Flash Lite (Google's ultra-fast lightweight model for real-time applications).

Mode Selection:

Quick mode switching directly from system tray.

Microphone Management:

Easy microphone device selection from tray menu.

System-wide Access:

Control CyberWhisper from anywhere in your system.

Quick Actions:

Essential functions accessible without opening the main window.

Version 0.1.1

Command Palette & Global Shortcuts

Smart Mode Search:

Search and activate modes using intelligent keyword matching.

Multi-criteria Filtering:

Find modes by preset names, voice models, LLM models, or descriptions.

Detailed Mode Information:

Display comprehensive mode details including preset, voice model, LLM model, input/output languages, and feature settings.

Global Shortcut Access:

Open Command Palette from anywhere with ⌘ + ⇧ + K.

Fuzzy Search:

Intelligent search that matches partial keywords and related terms.

Real-time Filtering:

Instant results as you type with live mode filtering.

Visual Mode Status:

Clear indication of active vs inactive modes with status badges.

Keyboard Navigation:

Full keyboard support with arrow keys and Enter to activate.

Added support for customizable global keyboard shortcuts.

Toggle recording with customizable shortcut (default: ⌥ + N).

Cancel Recording:

Cancel ongoing recording with Esc key.

Change Mode:

Quick mode switching with global shortcut (default: ⌘ + ⇧ + K). Shortcuts work system-wide, even when the app is not in focus. Fallback shortcut registration for better compatibility across different systems.

Version 0.1.0

Download Base Speech Models, Modes & Presets, and History Viewer

Download Base Speech Models

Support for downloading and running basic on-device speech-to-text models. This feature enables offline transcription capabilities and improved privacy for users who prefer to keep their audio data local.

Modes & Presets

Introduced Presets for Voice to Text and Message workflows, making it easier to configure transcription and usage modes. Users can now quickly switch between different processing modes without manual configuration.

History Viewer

Access and review your past transcriptions and interactions directly within the app. The History Viewer provides a comprehensive timeline of all your speech-to-text activities, making it easy to find and reference previous work.