Purpose of Layer 8
Layer 8 is the multimodal bridge in SSCA v7/v8 — it extends the core semantic compression engine beyond pure text and structured data to handle images, video, and audio by converting them into compressible scene graphs and semantic triples.
This makes SSCA a true hybrid visual-semantic compressor, combining the strengths of traditional image/video codecs (H.264/H.265, AVIF, Opus) with SSCA’s lossless meaning-layer efficiency.
Traditional compression treats pixels or waveforms as raw data — SSCA Layer 8 understands what the media means:
- Extracts objects, actions, relations, and context from visual/audio input
- Builds a scene graph (nodes = objects/attributes, edges = relations over time)
- Feeds the graph into SSCA’s core pipeline (Layers 1–9) for semantic compression
- Preserves meaning losslessly while traditional codecs handle the perceptual layer
Result: 20–40% additional savings on full multimedia streams, plus searchable, queryable meaning (e.g., “find all frames with person holding phone”).
How Layer 8 Works – High-Level Flowchart
Input: Image • Video frame • Audio clip
│
├─► 1. Extraction
│ │
│ ├─ Images/Video: OpenPSG / HIERCOM / STKET → temporal scene graphs
│ │
│ └─ Audio: Whisper transcripts + event detection (speech, laughter, music) → semantic triples
│
│
├─► 2. Graph Construction
│ │
│ └─ Nodes: Objects (car), attributes (red, moving)
│ │
│ └─ Edges: Spatial (near), temporal (before/during), actions (holding, walking toward)
│
│
├─► 3. Compression
│ │
│ └─ Feed graph to SSCA Layers 1–9
│ │
│ └─ Compress graph to 15–30% of JSON size (vs 40–60% with Brotli)
│ │
│ └─ Store alongside perceptual media (AVIF for images, Opus for audio)
│
│
└─► 4. Decompression (Reverse)
│
└─ Reconstruct graph losslessly
│
└─ Combine with decompressed perceptual media
│
└─ Enable semantic search (“person near car”) without full media scan
Layer 0 auto-selects lightweight models on edge devices to save power.
Layer 8 turns SSCA from a text compressor into a true multimodal semantic engine — the foundation for next-gen video, AR, and thought-to-text systems.