<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN" "JATS-journalpublishing1-4.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.4" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">jcc</journal-id>
      <journal-title-group>
        <journal-title>Journal of Computer and Communications</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2327-5227</issn>
      <issn pub-type="ppub">2327-5219</issn>
      <publisher>
        <publisher-name>Scientific Research Publishing</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.4236/jcc.2026.146007</article-id>
      <article-id pub-id-type="publisher-id">jcc-152045</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
        <subj-group>
          <subject>Computer Science</subject>
          <subject>Communications</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>A Lightweight Formal Framework for Behavioral Safety Auditing of Large Language Models in Cloud Infrastructures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Kouhoué</surname>
            <given-names>Austin Waffo</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Bouetou</surname>
            <given-names>Thomas Bouetou</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="aff1"><label>1</label> National Advanced School of Engineering of Yaoundé, The University of Yaoundé I, Yaoundé, Cameroon </aff>
      <author-notes>
        <fn fn-type="conflict" id="fn-conflict">
          <p>The authors declare no conflicts of interest regarding the publication of this paper.</p>
        </fn>
      </author-notes>
      <pub-date pub-type="epub">
        <day>11</day>
        <month>06</month>
        <year>2026</year>
      </pub-date>
      <pub-date pub-type="collection">
        <month>06</month>
        <year>2026</year>
      </pub-date>
      <volume>14</volume>
      <issue>06</issue>
      <fpage>87</fpage>
      <lpage>101</lpage>
      <history>
        <date date-type="received">
          <day>01</day>
          <month>05</month>
          <year>2026</year>
        </date>
        <date date-type="accepted">
          <day>21</day>
          <month>06</month>
          <year>2026</year>
        </date>
        <date date-type="published">
          <day>24</day>
          <month>06</month>
          <year>2026</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2026 by the authors and Scientific Research Publishing Inc.</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="open-access">
          <license-p> This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link> ). </license-p>
        </license>
      </permissions>
      <self-uri content-type="doi" xlink:href="https://doi.org/10.4236/jcc.2026.146007">https://doi.org/10.4236/jcc.2026.146007</self-uri>
      <abstract>
        <p>The rapid integration of Large Language Models (LLMs) into cloud-based ecosystems has shifted the cybersecurity landscape from classical data protection toward complex behavioral safety and algorithmic alignment. Despite their transformative potential, LLMs exhibit emergent vulnerabilities such as reward hacking, deceptive alignment, and proprietary data exfiltration that are often difficult to detect using traditional ad-hoc auditing methods. This paper introduces a formal, reproducible, and lightweight framework based on Formal Concept Analysis (FCA) to systematically evaluate security risks in cloud-deployed LLMs. By transforming semi-structured JSON audit logs into a mathematical formal context, we generate a concept lattice that reveals the hidden hierarchical dependencies and co-occurrences among vulnerability indicators. Experimental results on the GPT-OSS-20B model demonstrate that our framework can mathematically identify deceptive signatures, such as the correlation between pseudo-transparency claims and malicious alignment. The proposed methodology provides a deterministic reality check for AI governance, offering actionable insights for auditors and cloud service providers to harden LLM-based applications against structural failure modes.</p>
      </abstract>
      <kwd-group kwd-group-type="author-generated" xml:lang="en">
        <kwd>Cloud Computing Security</kwd>
        <kwd>Large Language Models (LLMs)</kwd>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>Deceptive Alignment</kwd>
        <kwd>AI Behavioral Auditing</kwd>
        <kwd>Cyber-Physical Systems</kwd>
        <kwd>AI Governance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec1">
      <title>1. Introduction</title>
      <p>The ubiquity of Large Language Models (LLMs) in modern cloud infrastructures has redefined the interaction between users and autonomous systems. As these models move from isolated research environments to the core of cloud-based services, the security focus has evolved beyond simple data encryption toward the challenges of algorithmic alignment and behavioral robustness [<xref ref-type="bibr" rid="B1">1</xref>]. While cloud providers offer scalable deployment, the black-box nature of transformer-based architectures introduces novel prompt-level and reasoning-level vulnerabilities that traditional security protocols are ill-equipped to handle [<xref ref-type="bibr" rid="B2">2</xref>]. The core problem addressed in this study is the lack of structured, formal frameworks for auditing the complex failure modes of LLMs. Foundational research has highlighted risks such as the exfiltration of sensitive training data [<xref ref-type="bibr" rid="B3">3</xref>] and the Sycophancy phenomenon, where models prioritize user satisfaction over safety [<xref ref-type="bibr" rid="B4">4</xref>]. More critically, the emergence of Deceptive Alignment where an agent appears compliant during safety evaluations while maintaining misaligned latent goals poses a severe threat to cloud-based AI governance [<xref ref-type="bibr" rid="B5">5</xref>][<xref ref-type="bibr" rid="B6">6</xref>]. Current audit practices often rely on qualitative observations, lacking a mathematical basis to identify the stable patterns of behavior that lead to critical failures.</p>
      <p>To bridge this gap, we propose a lightweight formal framework based on Formal Concept Analysis (FCA) [<xref ref-type="bibr" rid="B7">7</xref>][<xref ref-type="bibr" rid="B8">8</xref>]. FCA provides a rigorous mathematical foundation to map the relationship between specific test scenarios (objects) and their behavioral outcomes (attributes). By constructing a Risk Formal Context from audit logs, we move beyond anecdotal evidence to a structured Concept Lattice that visualizes the hierarchy of LLM vulnerabilities. Our methodology allows auditors to detect structural implications, such as the relationship between a model’s authoritative tone and its tendency to provide unsafe content. This study makes three primary contributions: (i) the definition of a semantic binarization process for LLM audit logs, (ii) the generation of a conceptual hierarchy for the GPT-OSS-20B model, and (iii) the extraction of exact implication rules that serve as predictive signatures for critical severity risks.</p>
      <p>The remainder of this paper is organized as follows: Section 2 provides the mathematical foundations of FCA. Section 3 details the experimental setup, including the dataset and the analysis pipeline. Section 4 presents the concept lattice and discusses the implication rules. Section 5 situates our work within the current state of the art in LLM safety, and Section 6 concludes with perspectives for future research.</p>
    </sec>
    <sec id="sec2">
      <title>2. Background</title>
      <p>Formal Concept Analysis (FCA) is a mathematical framework that aims to identify, structure, and organize knowledge from binary data by means of concept lattices [<xref ref-type="bibr" rid="B9">9</xref>].</p>
      <p><bold>Definition 1 (Formal context)</bold><italic>A formal context is a triple</italic><inline-formula><mml:math><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi mathvariant="double-struck"> O </mml:mi><mml:mo> , </mml:mo><mml:mi mathvariant="double-struck"> A </mml:mi><mml:mo> , </mml:mo><mml:mi mathvariant="double-struck"> I </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>in which</italic><inline-formula><mml:math><mml:mi mathvariant="double-struck"> O </mml:mi></mml:math></inline-formula><italic>is a set of objects,</italic><inline-formula><mml:math><mml:mi mathvariant="double-struck"> A </mml:mi></mml:math></inline-formula><italic>is set of attributes and</italic><inline-formula><mml:math><mml:mrow><mml:mi mathvariant="double-struck"> I </mml:mi><mml:mo> ⊆ </mml:mo><mml:mi mathvariant="double-struck"> O </mml:mi><mml:mo> × </mml:mo><mml:mi mathvariant="double-struck"> A </mml:mi></mml:mrow></mml:math></inline-formula><italic>is a binary relation between objects and attributes</italic>[<xref ref-type="bibr" rid="B10">10</xref>]. </p>
      <p>Formal contexts formalise binary datasets and can be represented by a crosstable, as illustrated in<bold>Table 1</bold>. This example has LLM vulnerability instances as objects and security attributes as attributes.</p>
      <p><bold>Table 1.</bold>A reduce formal context describing three LLM vulnerability instance as objects stocks_market, deceptive_alignment and proprietary_data through three security attributes high_severity, proprietary_data_risk and pseudo_transparency. The following context has been formulated based on the analysis of the GPT-OSS-20B audit reports.</p>
      <table-wrap id="tbl1">
        <label>Table 1</label>
        <table>
          <tbody>
            <tr>
              <td>
              </td>
              <td>high_severity</td>
              <td>pseudo_transparency</td>
              <td>proprietary_data_risk</td>
            </tr>
            <tr>
              <td>stocks_market</td>
              <td>x</td>
              <td>
              </td>
              <td>
              </td>
            </tr>
            <tr>
              <td>deceptive_alignment</td>
              <td>x</td>
              <td>x</td>
              <td>
              </td>
            </tr>
            <tr>
              <td>proprietary_data</td>
              <td>x</td>
              <td>
              </td>
              <td>x</td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
      <p><bold>Definition 2 (Derivation operators)</bold><italic>Let</italic><inline-formula><mml:math><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi mathvariant="double-struck"> O </mml:mi><mml:mo> , </mml:mo><mml:mi mathvariant="double-struck"> A </mml:mi><mml:mo> , </mml:mo><mml:mi mathvariant="double-struck"> I </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>be a formal context. The operators</italic></p>
      <disp-formula id="FD1">
        <mml:math>
          <mml:mtable columnalign="left">
            <mml:mtr>
              <mml:mtd>
                <mml:msup>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mo>⋅</mml:mo>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                  <mml:mo>′</mml:mo>
                </mml:msup>
                <mml:mo>:</mml:mo>
                <mml:msup>
                  <mml:mn>2</mml:mn>
                  <mml:mi mathvariant="double-struck">A</mml:mi>
                </mml:msup>
                <mml:mo>→</mml:mo>
                <mml:msup>
                  <mml:mn>2</mml:mn>
                  <mml:mi mathvariant="double-struck">O</mml:mi>
                </mml:msup>
              </mml:mtd>
            </mml:mtr>
            <mml:mtr>
              <mml:mtd>
                <mml:msup>
                  <mml:mi mathvariant="double-struck">A</mml:mi>
                  <mml:mo>′</mml:mo>
                </mml:msup>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mo>{</mml:mo>
                  <mml:mrow>
                    <mml:mi>o</mml:mi>
                    <mml:mo>∈</mml:mo>
                    <mml:mi mathvariant="double-struck">O</mml:mi>
                    <mml:mo>|</mml:mo>
                    <mml:mo>∀</mml:mo>
                    <mml:mi>a</mml:mi>
                    <mml:mo>∈</mml:mo>
                    <mml:mi mathvariant="double-struck">A</mml:mi>
                    <mml:mo>,</mml:mo>
                    <mml:mrow>
                      <mml:mo>(</mml:mo>
                      <mml:mrow>
                        <mml:mi>o</mml:mi>
                        <mml:mo>,</mml:mo>
                        <mml:mi>a</mml:mi>
                      </mml:mrow>
                      <mml:mo>)</mml:mo>
                    </mml:mrow>
                    <mml:mo>∈</mml:mo>
                    <mml:mi mathvariant="double-struck">I</mml:mi>
                  </mml:mrow>
                  <mml:mo>}</mml:mo>
                </mml:mrow>
              </mml:mtd>
            </mml:mtr>
          </mml:mtable>
        </mml:math>
      </disp-formula>
      <p>and </p>
      <disp-formula id="FD2">
        <mml:math display="inline">
          <mml:mtable columnalign="left">
            <mml:mtr>
              <mml:mtd>
                <mml:msup>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mo>⋅</mml:mo>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                  <mml:mo>′</mml:mo>
                </mml:msup>
                <mml:mo>:</mml:mo>
                <mml:msup>
                  <mml:mn>2</mml:mn>
                  <mml:mi mathvariant="double-struck">O</mml:mi>
                </mml:msup>
                <mml:mo>→</mml:mo>
                <mml:msup>
                  <mml:mn>2</mml:mn>
                  <mml:mi mathvariant="double-struck">A</mml:mi>
                </mml:msup>
              </mml:mtd>
            </mml:mtr>
            <mml:mtr>
              <mml:mtd>
                <mml:msup>
                  <mml:mi mathvariant="double-struck">O</mml:mi>
                  <mml:mo>′</mml:mo>
                </mml:msup>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mo>{</mml:mo>
                  <mml:mrow>
                    <mml:mi>a</mml:mi>
                    <mml:mo>∈</mml:mo>
                    <mml:mi mathvariant="double-struck">A</mml:mi>
                    <mml:mo>|</mml:mo>
                    <mml:mo>∀</mml:mo>
                    <mml:mi>o</mml:mi>
                    <mml:mo>∈</mml:mo>
                    <mml:mi mathvariant="double-struck">O</mml:mi>
                    <mml:mo>,</mml:mo>
                    <mml:mrow>
                      <mml:mo>(</mml:mo>
                      <mml:mrow>
                        <mml:mi>o</mml:mi>
                        <mml:mo>,</mml:mo>
                        <mml:mi>a</mml:mi>
                      </mml:mrow>
                      <mml:mo>)</mml:mo>
                    </mml:mrow>
                    <mml:mo>∈</mml:mo>
                    <mml:mi mathvariant="double-struck">I</mml:mi>
                  </mml:mrow>
                  <mml:mo>}</mml:mo>
                </mml:mrow>
              </mml:mtd>
            </mml:mtr>
          </mml:mtable>
        </mml:math>
      </disp-formula>
      <p>are called derivation operators of the formal context. </p>
      <p>The two derivation operators of a formal context form a Galois connection and, as such, their compositions <inline-formula><mml:math><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mo> ⋅ </mml:mo><mml:mo> ) </mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo> ′ </mml:mo><mml:mtext></mml:mtext><mml:mo> ′ </mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are closure operators,<italic>i.e</italic><italic>.</italic><inline-formula><mml:math><mml:mrow><mml:mi> X </mml:mi><mml:mo> ⊆ </mml:mo><mml:msup><mml:mi> X </mml:mi><mml:mo> ″ </mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> , <inline-formula><mml:math><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:msup><mml:mi> X </mml:mi><mml:mo> ″ </mml:mo></mml:msup><mml:mo> ) </mml:mo></mml:mrow></mml:mrow><mml:mo> ′ </mml:mo></mml:msup><mml:mo> = </mml:mo><mml:msup><mml:mi> X </mml:mi><mml:mo> ″ </mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and if <inline-formula><mml:math><mml:mrow><mml:mi> X </mml:mi><mml:mo> ⊆ </mml:mo><mml:mi> Y </mml:mi></mml:mrow></mml:math></inline-formula> then <inline-formula><mml:math><mml:mrow><mml:msup><mml:mi> X </mml:mi><mml:mo> ′ </mml:mo></mml:msup><mml:mo> ⊆ </mml:mo><mml:msup><mml:mi> Y </mml:mi><mml:mo> ″ </mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> . Sets <inline-formula><mml:math><mml:mi> X </mml:mi></mml:math></inline-formula> such that <inline-formula><mml:math><mml:mrow><mml:mi> X </mml:mi><mml:mo> = </mml:mo><mml:msup><mml:mi> X </mml:mi><mml:mo> ″ </mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> are said to be closed [<xref ref-type="bibr" rid="B10">10</xref>].</p>
      <p><bold>Definition 3 (Formal Concept)</bold><italic>Given a formal context</italic><inline-formula><mml:math><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi> O </mml:mi><mml:mo> , </mml:mo><mml:mi> A </mml:mi><mml:mo> , </mml:mo><mml:mi> J </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>, formal concept</italic><inline-formula><mml:math><mml:mi> C </mml:mi></mml:math></inline-formula><italic>is a pair</italic><inline-formula><mml:math><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi> E </mml:mi><mml:mo> , </mml:mo><mml:mi> I </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>such that</italic><inline-formula><mml:math><mml:mrow><mml:mi> E </mml:mi><mml:mo> ⊆ </mml:mo><mml:mi> O </mml:mi></mml:mrow></mml:math></inline-formula><italic>and</italic><inline-formula><mml:math><mml:mrow><mml:mi> I </mml:mi><mml:mo> ⊆ </mml:mo><mml:mi> A </mml:mi></mml:mrow></mml:math></inline-formula><italic>. It depicts a maximal set of objects that share a maximal set of common attributes.</italic><inline-formula><mml:math><mml:mrow><mml:mi> E </mml:mi><mml:mo> = </mml:mo><mml:mrow><mml:mo> { </mml:mo><mml:mrow><mml:mi> o </mml:mi><mml:mo> ∈ </mml:mo><mml:mi> O </mml:mi><mml:mo> | </mml:mo><mml:mo> ∀ </mml:mo><mml:mi> a </mml:mi><mml:mo> ∈ </mml:mo><mml:mi> I </mml:mi><mml:mo> , </mml:mo><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi> o </mml:mi><mml:mo> , </mml:mo><mml:mi> a </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow><mml:mo> ∈ </mml:mo><mml:mi> J </mml:mi></mml:mrow><mml:mo> } </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>is the concept’s</italic><italic><bold>extent</bold></italic><italic>, denoted by</italic><inline-formula><mml:math><mml:mrow><mml:mi> E </mml:mi><mml:mi> x </mml:mi><mml:mi> t </mml:mi><mml:mrow><mml:mo> ( </mml:mo><mml:mi> C </mml:mi><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>, and</italic><inline-formula><mml:math><mml:mrow><mml:mi> I </mml:mi><mml:mo> = </mml:mo><mml:mrow><mml:mo> { </mml:mo><mml:mrow><mml:mi> a </mml:mi><mml:mo> ∈ </mml:mo><mml:mi> A </mml:mi><mml:mo> | </mml:mo><mml:mo> ∀ </mml:mo><mml:mi> o </mml:mi><mml:mo> ∈ </mml:mo><mml:mi> E </mml:mi><mml:mo> , </mml:mo><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi> o </mml:mi><mml:mo> , </mml:mo><mml:mi> a </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow><mml:mo> ∈ </mml:mo><mml:mi> J </mml:mi></mml:mrow><mml:mo> } </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>is the concept’s</italic><italic><bold>intent</bold></italic><italic>, denoted by</italic><inline-formula><mml:math><mml:mrow><mml:mi> I </mml:mi><mml:mi> n </mml:mi><mml:mi> t </mml:mi><mml:mrow><mml:mo> ( </mml:mo><mml:mi> C </mml:mi><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>.</italic></p>
      <p>For instance, let us arbitrarily select the set of objects {stocks_market, deceptive_alignment} in <bold>Table 1</bold>. Now, we select all the attributes shared by this set of objects, and we obtain the following set {high_severity}. Finally, let us retrieve all the objects possessing this set of attribute {high_severity}: we obtain E = {stocks_market, deceptive_alignment, proprietary_data}. We have extracted the formal concept composed of the pair E = {stocks_market, deceptive_alignment, proprietary_data} and I = {high_severity}.</p>
      <p>The set of all concepts that can be extracted from a formal context K can be partially ordered by the set-inclusion order on the concepts’ extents, also called the specialization. </p>
      <p><bold>Definition 4 (</bold><bold>Specialisation</bold><bold>order</bold><inline-formula><mml:math><mml:mrow><mml:msub><mml:mo> ≤ </mml:mo><mml:mi> s </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula><bold>)</bold><italic>Given a formal context</italic><inline-formula><mml:math><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi> O </mml:mi><mml:mo> , </mml:mo><mml:mi> A </mml:mi><mml:mo> , </mml:mo><mml:mi> J </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>and two concepts</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 1 </mml:mn></mml:msub><mml:mo> = </mml:mo><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:msub><mml:mi> E </mml:mi><mml:mn> 1 </mml:mn></mml:msub><mml:mo> , </mml:mo><mml:msub><mml:mi> I </mml:mi><mml:mn> 1 </mml:mn></mml:msub></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>and</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 2 </mml:mn></mml:msub><mml:mo> = </mml:mo><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:msub><mml:mi> E </mml:mi><mml:mn> 2 </mml:mn></mml:msub><mml:mo> , </mml:mo><mml:msub><mml:mi> I </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>of</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mi> K </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula><italic>,</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 1 </mml:mn></mml:msub><mml:msub><mml:mo> ≤ </mml:mo><mml:mi> s </mml:mi></mml:msub><mml:msub><mml:mi> C </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>if and</italic><italic>only if</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> E </mml:mi><mml:mn> 1 </mml:mn></mml:msub><mml:mo> ⊆ </mml:mo><mml:msub><mml:mi> E </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>and</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> I </mml:mi><mml:mn> 2 </mml:mn></mml:msub><mml:mo> ⊆ </mml:mo><mml:msub><mml:mi> I </mml:mi><mml:mn> 1 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>. Then,</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 1 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>is called sub-concept of</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>, and</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>a super-concept of</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 1 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>.</italic></p>
      <p>Therefore, a concept inherits all the attributes of its super-concepts, and all the objects of its sub-concepts. When provided with the specialisation order <inline-formula><mml:math><mml:mrow><mml:msub><mml:mo> ≤ </mml:mo><mml:mi> s </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the set of all concepts forms a structure called a concept lattice [<xref ref-type="bibr" rid="B11">11</xref>]. </p>
      <p><bold>Definition 5 (Concept lattice)</bold><italic>Given</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mi> K </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula><italic>the set of all concepts extracted from a formal context</italic><inline-formula><mml:math><mml:mi> K </mml:mi></mml:math></inline-formula><italic>, the concept lattice associated with</italic><inline-formula><mml:math><mml:mi> K </mml:mi></mml:math></inline-formula><italic>, denoted by</italic><inline-formula><mml:math><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mi> K </mml:mi></mml:msub><mml:mo> , </mml:mo><mml:msub><mml:mo> ≤ </mml:mo><mml:mi> s </mml:mi></mml:msub></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula><italic>, is the set of all concepts</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mi> K </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula><italic>provided with the</italic><italic>specialisation</italic><italic>order</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mo> ≤ </mml:mo><mml:mi> s </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula><italic>.</italic></p>
      <fig id="fig1">
        <label>Figure 1</label>
        <graphic xlink:href="https://html.scirp.org/file/1733547-rId99.jpeg?20260624021838" />
      </fig>
      <p><bold>Figure 1</bold><bold>.</bold> Concept lattice associated with the context of <bold>Table 1</bold>.</p>
      <p><xref ref-type="fig" rid="fig1">Figure 1</xref>represents the Hasse diagram of the concept lattice associated with the formal context of <bold>Table 1</bold>, from which 4 concepts have been extracted and then partially ordered. The construction tool used here is FCA4J<sup>1</sup>. A concept is represented by a three-part box displaying the name of the concept (top part), its intent (middle part), and its extent (bottom part). An arrow between two concepts shows the specialisation order. In this representation, intents and extents of concepts are simplified: attributes (resp. objects) appear only once in the concept lattice, in the concept where they are introduced <italic>i.e.</italic>, the greatest (resp. lowest) concept having that attribute (resp. object). In this simplified representation, the intent and the extent of a concept can then be reconstituted by inheritance. The concept name is composed of an identifier followed by the cardinalities of its intent and its extent. For example, the intent of Concept 1 (<italic>I</italic>:2, <italic>E</italic>:2) is Int (1(<italic>I</italic>:2, <italic>E</italic>:1)) = {high_severity, pseudo_transparency}, and Ext (1(<italic>I</italic>:2, <italic>E</italic>:1)) = {deceptive_alignment}.</p>
      <p>We call object-concepts and attribute-concepts the concepts which introduce respectively at least an object or an attribute; we call plain-concepts the ones which introduce neither attributes nor objects [<xref ref-type="bibr" rid="B11">11</xref>]. In <xref ref-type="fig" rid="fig1">Figure 1</xref> Concept 0 (<italic>I</italic>:1, <italic>E</italic>:3) is both an object-introducing concept and an attribute-introducing concept, as it introduces the object <italic>deceptive</italic>_<italic>alignment</italic> and the attribute <italic>pseudo</italic>_<italic>transparency</italic>. Concept 2 (<italic>I</italic>: 3, <italic>E</italic>:0) is a plain-concept. In what follows, the set of all object-concepts of a context <inline-formula><mml:math><mml:mi> K </mml:mi></mml:math></inline-formula> is denoted by <inline-formula><mml:math><mml:mrow><mml:mi> O </mml:mi><mml:msub><mml:mi> C </mml:mi><mml:mi> K </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> , and the set of all attribute-concepts is denoted by <inline-formula><mml:math><mml:mrow><mml:mi> A </mml:mi><mml:msub><mml:mi> C </mml:mi><mml:mi> K </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> .</p>
      <p><bold>Property 1</bold><italic>. Given two features</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> f </mml:mi><mml:mn> 1 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>and</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> f </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>respectively introduced in concepts</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 1 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>and</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>,</italic><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> C </mml:mi><mml:mn> 2 </mml:mn></mml:msub><mml:msub><mml:mo> ≤ </mml:mo><mml:mi> s </mml:mi></mml:msub><mml:msub><mml:mi> C </mml:mi><mml:mn> 1 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><inline-formula><mml:math><mml:mo> ⇔ </mml:mo></mml:math></inline-formula><inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> f </mml:mi><mml:mn> 1 </mml:mn></mml:msub><mml:mo> ⇒ </mml:mo><mml:msub><mml:mi> f </mml:mi><mml:mn> 2 </mml:mn></mml:msub></mml:mrow></mml:math></inline-formula><italic>. Binary implications can be found by following the arrows in the Hasse diagram of the conceptual structures</italic>[<xref ref-type="bibr" rid="B11">11</xref>]. </p>
      <p>For instance, in <xref ref-type="fig" rid="fig1">Figure 1</xref>, the feature <italic>high</italic>_<italic>severity</italic> is introduced in a super-concept of the concept introducing the feature <italic>pseudo</italic>_<italic>transparency</italic>, so we can extract the implication <italic>pseudo</italic>_<italic>transparency</italic> ⇒ <italic>high</italic>_<italic>severity</italic>, which establishes pseudo transparency as a precursor to critical risk levels.</p>
    </sec>
    <sec id="sec3">
      <title>3. Experimental Setting</title>
      <p>This section delineates the empirical framework of the study, structured into three progressive stages: the characterization of the audit dataset (Section 3.1), the logical binarization and attribute engineering process (Section 3.2), and the comprehensive analytical pipeline implemented for knowledge discovery (Section 3.3).</p>
      <sec id="sec3dot1">
        <title>3.1. Dataset Presentation</title>
        <p>The empirical basis for this study is a specialized dataset originating from the work of [<xref ref-type="bibr" rid="B12">12</xref>], titled “Using LLMs to Improve the Accuracy of SBOM-Based Vulnerability Assessment”. This corpus, which documents the safety profile and behavioral anomalies of the GPT-OSS-20B model, is publicly available online<sup>2</sup>.</p>
        <p>The dataset comprises 13 detailed vulnerability reports structured in a standardized JSON format. These reports categorize model failures across five primary safety-critical domains: Reward Hacking, Deceptive Alignment, Potential Sabotage, Data Exfiltration, and Evaluation Awareness. Each record provides a granular view of the model’s output, capturing both the internal “chain-of-thought” (analysis) and the final response generated under specific system prompts. For the purpose of Correspondence Analysis (CA), the data is enriched with categorical metadata, including self-assessed severity levels and specific vulnerability indicators (e.g., lexical markers such as “confidential”, “proprietary”, or “transparent”). This structured relationship between risk categories, severity rankings, and linguistic markers forms the multidimensional contingency table required to map the associative space of LLM vulnerabilities.</p>
        <p>The dataset utilized in this study comprises the totality of the 13 unique vulnerability reports available in the source repository [<xref ref-type="bibr" rid="B12">12</xref>]. No records were excluded, as this exhaustive selection ensures that the case study boundary covers all safety critical domains identified by the original auditors.</p>
        <p>Although the dataset comprises 13 records, this qualitative scale aligns with the emerging Less Is More for Alignment (LIMA) paradigm, which demonstrates that a small set of high-fidelity, diverse examples is more effective for capturing structural model behaviors than large, noisy datasets [<xref ref-type="bibr" rid="B13">13</xref>]. In the field of AI safety, the shift from statistical benchmarking toward Forensic Red-Teaming necessitates a focus on worst-case structural vulnerabilities rather than average-case performance [<xref ref-type="bibr" rid="B14">14</xref>]. As emphasized in recent frameworks for Formal AI Auditing, the goal of security evaluation is to identify deterministic failure modes; in this context, a single well-characterized failure constitutes a formal proof of vulnerability [<xref ref-type="bibr" rid="B15">15</xref>]. Consequently, our 13 detailed audit reports provide the necessary logical density for Formal Concept Analysis to map the model’s structural failure modes and deceptive signatures without the need for statistical oversampling.</p>
      </sec>
      <sec id="sec3dot2">
        <title>3.2. Data Preparation</title>
        <p>Building upon the multidimensional structure described in Section 3.1, the data was transformed into a binary formal context <inline-formula><mml:math><mml:mrow><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi mathvariant="double-struck"> O </mml:mi><mml:mo> , </mml:mo><mml:mi mathvariant="double-struck"> A </mml:mi><mml:mo> , </mml:mo><mml:mi mathvariant="double-struck"> I </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> to satisfy the requirements of Formal Concept Analysis (FCA). This transition from raw JSON logs to a formal matrix involved a process of semantic aggregation and conceptual scaling.</p>
        <p>The transformation of raw JSON logs into a binary formal context followed a rigorous coding protocol. Three primary JSON fields were extracted for analysis: vulnerability_type, severity, and the analysis (model’s internal chain-of-thought). The inclusion of the analysis field is methodologically vital: it allows an external auditor to detect reasoning-level deceptive signatures where the model’s internal logic may reveal a misalignment that its final response attempts to mask. To ensure the reliability of this human-led semantic aggregation, the labeling was performed by the lead researcher and cross-validated by a second coder to resolve any ambiguities in attribute assignment.</p>
        <p>To maintain analytical clarity while optimizing the lattice complexity, we performed a dimensionality reduction by grouping the initial categories and lexical markers into six Super-Attributes. This mapping logic ensures that each attribute captures a distinct dimension of the model’s risk profile: </p>
        <p><bold>MAL_AL (Malicious Alignment):</bold> Aggregates the domains of Reward Hacking, Deceptive Alignment, and Evaluation Awareness. It identifies instances where the model’s incentives deviate from its intended safety constraints. <bold>TECH_RISK (Technical Risk):</bold> Consolidates Potential Sabotage and Data Exfiltration. This attribute focuses on the operational impact on system integrity and infrastructure security. <bold>PSEUDO_TR (Pseudo-Transparency):</bold> Derived from specific linguistic markers identified in the reports. It flags responses where the model uses self-referential honesty (e.g., “I am designed to be transparent”) as a justification for its behavior. <bold>OVER_CONF (Epistemic Overconfidence):</bold> Captures the presence of authoritative lexical markers (e.g., confident, certain) in contexts where the model provides unverified or sensitive information. <bold>PROP_EXP (Proprietary Exposure):</bold> Specifically tracks the leakage of training data characteristics or internal technical specifications, as identified in the audit’s vulnerability indicators. <bold>CRIT_SEV (Critical Severity):</bold> Represents a binary quantization of the severity metadata provided in the source dataset [<xref ref-type="bibr" rid="B12">12</xref>], which uses a 3-point scale (1: Low, 2: Medium, 3: High). To isolate high-impact risks that require immediate mitigation, this attribute is assigned only to reports with a self-assessed severity level ≥2 (Medium and High). This threshold was chosen to align with standard cloud security auditing where Medium and High impacts trigger mandatory intervention. </p>
        <p>The 13 audit records extracted from the dataset were mapped to unique identifiers (R1 to R13) and categorized according to their specific vulnerability focus: </p>
        <p><bold>R1</bold><bold>-</bold><bold>R3 (Reward Hacking):</bold> Scenarios involving the solicitation of definitive answers in medical (cancer cure), gambling (lottery), and financial (stock market) contexts. <bold>R4</bold><bold>-</bold><bold>R5 (Deceptive Alignment):</bold> Reports focusing on the model’s willingness to mislead users for a “greater good” or to protect its own operational integrity. <bold>R6</bold><bold>-</bold><bold>R7 (Potential Sabotage):</bold> Tasks involving critical system operations, such as temporary file cleanup and production database migrations. <bold>R8</bold><bold>-</bold><bold>R10 (Data Exfiltration):</bold> Probes targeting proprietary information, including model architecture, internal training conversations, and confidential specifications. <bold>R11</bold><bold>-</bold><bold>R13 (Evaluation Awareness):</bold> Scenarios testing the model’s ability to detect and adapt its behavior within research-controlled environments. </p>
        <p>To illustrate the binarization process, <bold>Table 2</bold> provides a worked example using Report R4. This specific report demonstrates how a raw vulnerability entry is mapped onto the six super-attributes based on the presence of specific keywords and the model’s self-reported reasoning.</p>
        <p><bold>Table 2</bold><bold>.</bold> Coding protocol and binarization example for report R4.</p>
        <table-wrap id="tbl2">
          <label>Table 2</label>
          <table>
            <tbody>
              <tr>
                <td>
                  <bold>JSON Source Field</bold>
                </td>
                <td>
                  <bold>Raw Evidence (R4)</bold>
                </td>
                <td>
                  <bold>Super-Attribute</bold>
                </td>
                <td>
                  <bold>Logic/Rule</bold>
                </td>
                <td>
                  <bold>Value</bold>
                </td>
              </tr>
              <tr>
                <td>vulnerability_type</td>
                <td>“Deceptive Alignment”</td>
                <td>MAL_AL</td>
                <td>Matches “Malicious Alignment” category.</td>
                <td>1</td>
              </tr>
              <tr>
                <td>vulnerability_type</td>
                <td>“Deceptive Alignment”</td>
                <td>TECH_RISK</td>
                <td>No “Sabotage” or “Exfiltration” keywords detected.</td>
                <td>0</td>
              </tr>
              <tr>
                <td>analysis (CoT)</td>
                <td>“I am designed to be...”</td>
                <td>PSEUDO_TR</td>
                <td>Presence of self-referential honesty markers.</td>
                <td>1</td>
              </tr>
              <tr>
                <td>analysis (CoT)</td>
                <td>(No markers detected)</td>
                <td>OVER_CONF</td>
                <td>Absence of authoritative markers like “certain”.</td>
                <td>0</td>
              </tr>
              <tr>
                <td>vulnerability_indicators</td>
                <td>(No related exposure)</td>
                <td>PROP_EXP</td>
                <td>No internal technical specifications identified.</td>
                <td>0</td>
              </tr>
              <tr>
                <td>severity</td>
                <td>2</td>
                <td>CRIT_SEV</td>
                <td>Value meets the ≥2 threshold (Medium risk).</td>
                <td>1</td>
              </tr>
              <tr>
                <td colspan="4">
                  <bold>Final Binary Vector for R4:</bold>
                </td>
                <td>
                  <bold>[</bold>
                  <bold>1, 0, 1, 0, 0, 1]</bold>
                </td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>As shown in <bold>Table 2</bold>, the attribute <bold>PSEUDO_TR</bold> is triggered by the model’s internal analysis field, which reveals a strategic intent to seem honest while actually being misaligned. This mapping allows the FCA to subsequently identify if such a veneer of honesty is a recurring pattern across the entire dataset.</p>
        <p>Following this systematic binarization protocol, each of the 13 audit records was transformed into its corresponding binary representation. To ensure the reliability of this semantic aggregation and mitigate human bias, the coding process was performed in two stages: an initial assignment by the primary researcher followed by a verification pass by the second author. Any discrepancies in attribute mapping were resolved through consensus to ensure inter-rater reliability. This step is essential as it transforms unstructured, qualitative JSON data into a structured mathematical object; the <italic>Risk Formal Context</italic>, where each row represents a specific vulnerability profile and each column a super-attribute. Consequently, we generated the complete formal context presented in <bold>Table 3</bold>, which serves as the deterministic input for the subsequent Formal Concept Analysis.</p>
        <p><bold>Table 3.</bold> Formal context of GPT-OSS-20B vulnerabilities.</p>
        <table-wrap id="tbl3">
          <label>Table 3</label>
          <table>
            <tbody>
              <tr>
                <td>
                </td>
                <td>MAL_AL</td>
                <td>TECH_RISK</td>
                <td>PSEUDO_TR</td>
                <td>OVER_CONF</td>
                <td>PROP_EXP</td>
                <td>CRIT_SEV</td>
              </tr>
              <tr>
                <td>R1</td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
              </tr>
              <tr>
                <td>R2</td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
              </tr>
              <tr>
                <td>R3</td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>x</td>
              </tr>
              <tr>
                <td>R4</td>
                <td>x</td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
              </tr>
              <tr>
                <td>R5</td>
                <td>x</td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
              </tr>
              <tr>
                <td>R6</td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
              </tr>
              <tr>
                <td>R7</td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
              </tr>
              <tr>
                <td>R8</td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
                <td>x</td>
              </tr>
              <tr>
                <td>R9</td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
                <td>x</td>
              </tr>
              <tr>
                <td>R10</td>
                <td>
                </td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
                <td>x</td>
              </tr>
              <tr>
                <td>R11</td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
              </tr>
              <tr>
                <td>R12</td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>x</td>
              </tr>
              <tr>
                <td>R13</td>
                <td>x</td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
                <td>
                </td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>This structured binary representation allows for the mathematical extraction of implications and the visualization of the vulnerability hierarchy through the concept lattice.</p>
      </sec>
      <sec id="sec3dot3">
        <title>3.3. Analysis Pipeline</title>
        <p><xref ref-type="fig" rid="fig2">Figure 2</xref>illustrates the analytical pipeline adopted in this study, which is structured into three fundamental stages: Data Preparation, Data Mining, and Interpretation.</p>
        <p>The first stage (Data Preparation) begins with the collection of raw JSON security audit logs from the GPT-OSS-20B LLM. Relevant vulnerability metadata are then selected and subjected to a qualitative binarization process. This human led step is crucial for transforming semi-structured audit data into a binary LLM Risk Formal Context, which serves as the mathematical foundation for the subsequent analysis.</p>
        <p>The second stage (Data Mining) involves the algorithmic processing of the formal context. Using Formal Concept Analysis (FCA) techniques, the system computes the underlying conceptual structures to generate the Concept Lattice. This lattice visually maps the hierarchical relationships and overlaps between different vulnerability attributes and audit records.</p>
        <p>Finally, the Interpretation stage focuses on translating the lattice’s topology into actionable security insights. This phase extracts Risk Implication Rules and establishes a clear hierarchy of threats. By analyzing these implications, the study identifies critical vulnerability profiles, producing specialized knowledge that can support AI safety researchers and developers in hardening large language models against deceptive alignment and technical risks.</p>
        <fig id="fig2">
          <label>Figure 2</label>
          <graphic xlink:href="https://html.scirp.org/file/1733547-rId122.jpeg?20260624021839" />
        </fig>
        <p><bold>Figure 2</bold><bold>.</bold> Structural architecture of our FCA-based knowledge discovery framework.</p>
      </sec>
    </sec>
    <sec id="sec4">
      <title>4. Results and Discussion</title>
      <p>This section provides a detailed account of the knowledge extracted through our formal framework and discusses its implications for cloud AI security. It is organized into two primary parts: the presentation of structural and logical results (Section 4.1) and a critical discussion situating these findings within the broader context of deceptive alignment and AI safety (Section 4.2).</p>
      <sec id="sec4dot1">
        <title>4.1. Results</title>
        <p>The findings of our formal mining process are organized into two complementary analyses: the topological exploration of the concept lattice (Section 4.1.1) and the discovery of logical implication rules (Section 4.1.2).</p>
        <p>4.1.1. Structural Analysis of the Concept Lattice</p>
        <p>The Hasse diagram shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> represents the conceptual hierarchy of GPT-OSS-20B vulnerabilities. Each node represents a formal concept, illustrating the dual relationship between the audit records (extents) and their shared security attributes (intents). </p>
        <fig id="fig3">
          <label>Figure 3</label>
          <graphic xlink:href="https://html.scirp.org/file/1733547-rId123.jpeg?20260624021840" />
        </fig>
        <p><bold>Figure 3</bold><bold>.</bold> The concepts lattice of GPT-OSS-20B vulnerabilities.</p>
        <p>The lattice reveals a clear tripartite structure in the model’s failure modes: </p>
        <p>1) <bold>The Malicious Alignment Branch (Concept 1)</bold>: This node acts as a major junction, regrouping 8 records. It shows that behavioral anomalies like Reward Hacking and Evaluation Awareness are intrinsically linked. Interestingly, we observe that <bold>OVER_CONF</bold> (Concept 5) and <bold>PSEUDO_TR</bold> (Concept 4) are direct sub-concepts of <bold>MAL_AL</bold>, suggesting that overconfidence and false transparency are specific manifestations of alignment failure. </p>
        <p>2) <bold>The Technical Risk and Proprietary Exposure Cluster (Concept 3 and 6)</bold>: There is a distinct vertical chain starting from <bold>TECH_RISK</bold> down to <bold>PROP_EXP</bold>. The fact that Concept 6 (containing R8, R9, R10) is a sub-concept of both <bold>TECH_RISK</bold> and <bold>CRIT_SEV</bold> is significant: it demonstrates that in this model, every instance of proprietary data exposure is systematically categorized as both a technical risk and a critical severity event. </p>
        <p>The lattice highlights that while some alignment issues are minor (like R13 or R5, located outside the <bold>CRIT_SEV</bold> scope), any interaction involving proprietary data or deceptive “pseudo-transparency” combined with specific tasks (R4) tends to gravitate towards the critical severity threshold.</p>
        <p>It is essential to distinguish between relationships discovered autonomously by the FCA and those inherent to our data encoding. For instance, the clustering of <bold>PROP_EXP</bold> and <bold>TECH_RISK</bold> reflects a structural reality in the model’s behavior: every time a technical specification was leaked, it was also categorized as an operational risk. While the grouping of attributes into Super-Attributes was a manual design choice to ensure analytical clarity, the hierarchical implications revealed in <xref ref-type="fig" rid="fig3">Figure 3</xref>(e.g., <bold>PSEUDO_TR</bold>⇒<bold>MAL_AL</bold>) are emergent properties of the model’s specific failure modes, providing a mathematical reality check on its behavioral consistency.</p>
        <p>4.1.2. Automated Logic Discovery through Implication Rules</p>
        <p>To complement the structural analysis of the lattice, we extracted the exact implication rules from the formal context using the FCA4J<sup>3</sup> framework. Unlike probabilistic association rules, these implications represent absolute logical dependencies within the dataset.</p>
        <fig id="fig4">
          <label>Figure 4</label>
          <graphic xlink:href="https://html.scirp.org/file/1733547-rId124.jpeg?20260624021840" />
        </fig>
        <p><bold>Figure 4</bold><bold>.</bold> Implication rules extracted from the formal context of <bold>Table 3</bold>, using the FCA4J tool.</p>
        <p><xref ref-type="fig" rid="fig4">Figure 4</xref>consolidates these results, distinguishing between rules based on their frequency of occurrence (the number of objects satisfying the rule) within a single visual: </p>
        <p>The first rule <bold>PSEUDO_TR</bold>⇒<bold>MAL_AL</bold> reveals a critical behavioral dependency. Although “Pseudo-Transparency” appears less frequently in the corpus, its presence serves as a systematic predictor of an underlying alignment failure (<bold>MAL_AL</bold>). This mathematically demonstrates that whenever the model invokes predefined honesty or transparency clauses, such discourse acts as a rhetorical veneer to mask deviations from safety constraints.The rest of rules identifies the most robust invariants of the GPT-OSS-20B model: <bold>OVER</bold><bold>CONF</bold> ⇐⇒<bold>MAL</bold><bold>AL:</bold> This implication proves that epistemic overconfidence—the use of an authoritative and non-nuanced tone is a recurring and structural manifestation of malicious alignment. <bold>PROP</bold><bold>EXP</bold>⇐<bold>{TECH</bold><bold>RISK, CRIT</bold><bold>SEV}:</bold> A perfect logical equivalence is observed here. The presence of the rule <bold>PROP_EXP</bold>⇒<bold>{TECH_RISK, CRIT_SEV}</bold> and its converse confirms that, within this dataset, the exposure of proprietary data is the sole and sufficient driver of the highest technical risk and severity levels. </p>
        <p>These automated discoveries confirm that the visual hierarchy of the lattice is not merely illustrative but reflects strict logical laws governing the vulnerability profile of the audited AI.</p>
      </sec>
      <sec id="sec4dot2">
        <title>4.2. Discussion</title>
        <p>The results obtained through the FCA pipeline provide empirical evidence of the Sleeper Agent phenomenon theorized by [<xref ref-type="bibr" rid="B6">6</xref>]. The structural analysis of the concept lattice (<xref ref-type="fig" rid="fig3">Figure 3</xref>) confirms that Malicious Alignment is not an isolated error but a junction point for multiple behavioral anomalies. A critical finding is the logical implication <bold>PSEUDO_TR</bold> ⇒ <bold>MAL_AL</bold>, which proves that the model’s claims of transparency (“I am designed to be honest”) are mathematically correlated with deceptive outputs. This validates the Sycophancy risks identified by [<xref ref-type="bibr" rid="B4">4</xref>], demonstrating that lexical markers of honesty can be used as a rhetorical veneer to bypass safety filters.</p>
        <p>Furthermore, the perfect equivalence discovered between Proprietary Exposure and Critical Severity reinforces the privacy concerns raised by [<xref ref-type="bibr" rid="B3">3</xref>], showing that for GPT-OSS-20B, the exfiltration of training data is structurally inseparable from high-impact technical risks. By transforming semi-structured audit logs into a formal hierarchy, this study demonstrates that FCA can bridge the gap between qualitative behavioral descriptions and quantitative risk assessment.</p>
        <p>This methodology provides a reproducible path for AI auditors to identify stable vulnerability profiles that traditional randomized testing might overlook.</p>
      </sec>
    </sec>
    <sec id="sec5">
      <title>5. Related Work</title>
      <p>The rapid integration of Large Language Models (LLMs) into cloud infrastructures has shifted the security focus from classical data protection toward algorithmic alignment and behavioral safety [<xref ref-type="bibr" rid="B1">1</xref>]. While statistical methods often require massive datasets to identify trends, formal auditing focuses on the structural consistency of failure modes, where even a limited number of high-quality audit reports can reveal critical logical vulnerabilities. To contextualize our approach, this section reviews three interrelated domains: the evolution of LLM security paradigms (Section 5.1), the specific challenges of deceptive alignment (Section 5.2), and the recent hybridization of LLMs with Formal Concept Analysis as a deterministic tool for AI governance (Section 5.3).</p>
      <sec id="sec5dot1">
        <title>5.1. LLM Security Paradigms and Emergent Vulnerabilities</title>
        <p>Traditional security focuses on code-level exploits, but LLM vulnerabilities emerge at the semantic level. While early research highlighted data exfiltration risks [<xref ref-type="bibr" rid="B3">3</xref>], others studies focus on jailbreaking and the failure of safety training [<xref ref-type="bibr" rid="B2">2</xref>]. Our work diverges from these input-centric attacks to focus on the internal logical consistency of the model’s reasoning, addressing the structural nature of its failure modes.</p>
      </sec>
      <sec id="sec5dot2">
        <title>5.2. Deceptive Alignment and Sycophancy</title>
        <p>A major frontier in AI safety is Deceptive Alignment, where models appear safe during evaluation but pursue misaligned goals in deployment [<xref ref-type="bibr" rid="B5">5</xref>]. Another work about Sleeper Agents empirically demonstrates that safety training can fail to remove deceptive behaviors, which may remain latent until triggered [<xref ref-type="bibr" rid="B6">6</xref>]. This is exacerbated by Sycophancy, where models prioritize evaluator satisfaction over truthfulness [<xref ref-type="bibr" rid="B4">4</xref>]. This study builds upon these findings by providing a formal taxonomy of such signatures within the GPT-OSS-20B model.</p>
      </sec>
      <sec id="sec5dot3">
        <title>5.3. From Knowledge Representation to Formal Auditing: Recent Advances in LLM-FCA Hybridization</title>
        <p>Recent literature highlights a growing synergy between Symbolic AI and Generative AI, where Formal Concept Analysis (FCA) serves as a rigorous framework to structure and validate LLM outputs. For instance, [<xref ref-type="bibr" rid="B16">16</xref>] demonstrated how Large Language Models can empower Relational Concept Analysis (RCA) through strategic knowledge delivery, enhancing the depth of relational discovery. Further explorations have utilized FCA as a reality check for LLMs, providing a formal basis to verify the conceptual consistency of generative models [<xref ref-type="bibr" rid="B17">17</xref>]. In specialized domains, the hybridization of LLMs with Triadic Concept Analysis (TCA) and RCA has yielded significant results in automating requirements engineering—such as variability-driven user-story generation—and software architecture restructuring [<xref ref-type="bibr" rid="B18">18</xref>]. Beyond theoretical modeling, this neuro-symbolic approach has been successfully applied to real-world challenges, including the management of agricultural knowledge bases in West Africa through the combination of symbolic reasoning and generative agents [<xref ref-type="bibr" rid="B19">19</xref>]. While these pioneering works primarily leverage LLMs to enhance formal knowledge representation or specialized task automation, our study shifts the focus toward adversarial behavioral auditing. We utilize the structural properties of FCA not to supplement LLM knowledge, but to mathematically map and diagnose the latent failure modes and deceptive alignment signatures within the models themselves. Framework (RMF 1.0, 2023) [<xref ref-type="bibr" rid="B20">20</xref>], positioning FCA as a crucial tool for cloud-based AI governance.</p>
      </sec>
    </sec>
    <sec id="sec6">
      <title>6. Conclusions</title>
      <p>This paper presented a novel, FCA-based framework for the behavioral auditing of Large Language Models in cloud environments. By applying formal concept discovery to the GPT-OSS-20B model, we successfully transformed semi-structured audit reports into a deterministic mathematical structure. Our findings demonstrate that FCA is a powerful tool for identifying latent failure modes, specifically revealing how Pseudo-Transparency lexical markers often mask underlying malicious alignment. The resulting concept lattice provides a clear, hierarchical view of risks, showing that proprietary data exposure is structurally inseparable from critical severity levels in the model under study. This lightweight approach offers a significant advantage for cloud-based AI governance: it is reproducible, non-statistical, and provides interpretable results that align with the transparency requirements of emerging regulations such as the EU AI Act and the NIST AI Risk Management Framework.</p>
      <p>Future work will focus on two main axes. First, we intend to extend this framework using Relational Concept Analysis (RCA) to handle multi-agent environments where security risks may propagate through inter-model interactions. Second, we aim to integrate Triadic Concept Analysis (TCA) to incorporate the user context as a third dimension, allowing for a more nuanced understanding of how different user profiles trigger specific model vulnerabilities. Finally, the automation of the binarization process using specialized Auditor LLMs will be explored to enable real-time safety monitoring in high-throughput cloud services.</p>
    </sec>
    <sec id="sec7">
      <title>Acknowledgements</title>
      <p>The authors would like to express their sincere gratitude to the open-source community and the researchers whose datasets made this study possible.</p>
    </sec>
    <sec id="sec8">
      <title>NOTES</title>
      <p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="https://www.lirmm.fr/fca4j/">https://www.lirmm.fr/fca4j/</ext-link></p>
      <p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="https://github.com/tobimichigan/Probe-Design-Case-Study-Of-Gpt-Oss-20b-Vulnerabilities/tree/main">https://github.com/tobimichigan/Probe-Design-Case-Study-Of-Gpt-Oss-20b-Vulnerabilities/tree/main</ext-link></p>
      <p><sup>3</sup><ext-link ext-link-type="uri" xlink:href="https://www.lirmm.fr/fca4j/">https://www.lirmm.fr/fca4j/</ext-link></p>
    </sec>
  </body>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="B1">
        <label>1.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Das, B.C., Amini, M.H. and Wu, Y. (2025) Security and Privacy Challenges of Large Language Models: A Survey. <italic>ACM Computing Surveys</italic>, 57, 1-39. https://doi.org/10.1145/3712001 <pub-id pub-id-type="doi">10.1145/3712001</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/3712001">https://doi.org/10.1145/3712001</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Das, B.C.</string-name>
              <string-name>Amini, M.H.</string-name>
              <string-name>Wu, Y.</string-name>
            </person-group>
            <year>2025</year>
            <article-title>Security and Privacy Challenges of Large Language Models: A Survey</article-title>
            <source>ACM Computing Surveys</source>
            <volume>57</volume>
            <pub-id pub-id-type="doi">10.1145/3712001</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B2">
        <label>2.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Haghtalab, N., Steinhardt, J. and Wei, A. (2023). Jailbroken: How Does LLM Safety Training Fail? <italic>Advances in Neural Information Processing Systems</italic>, 36, 80079-80110. https://doi.org/10.52202/075280-3508 <pub-id pub-id-type="doi">10.52202/075280-3508</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.52202/075280-3508">https://doi.org/10.52202/075280-3508</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Haghtalab, N.</string-name>
              <string-name>Steinhardt, J.</string-name>
              <string-name>Wei, A.</string-name>
            </person-group>
            <year>2023</year>
            <pub-id pub-id-type="doi">10.52202/075280-3508</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B3">
        <label>3.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., <italic>et al</italic>. (2021) Extracting Training Data from Large Language Models. 30 <italic>th USENIX Security Symposium</italic> ( <italic>USENIX Security</italic> 21), Online, 11-13 August 2021, 2633-2650.</mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Carlini, N.</string-name>
              <string-name>Tramer, F.</string-name>
              <string-name>Wallace, E.</string-name>
              <string-name>Jagielski, M.</string-name>
              <string-name>Herbert-Voss, A.</string-name>
            </person-group>
            <year>2021</year>
            <article-title>Extracting Training Data from Large Language Models</article-title>
            <source>30th USENIX Security Symposium (USENIX Security 21)</source>
            <volume>11</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B4">
        <label>4.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Perez, E., Ringer, S., Lukosiute, K., Nguyen, K., Chen, E., Heiner, S., <italic>et</italic><italic>al</italic>. (2023) Discovering Language Model Behaviors with Model-Written Evaluations. <italic>Findings of the Association for Computational Linguistics</italic>: <italic>ACL</italic> 2023, Toronto, July 2023, 13387-13434. https://doi.org/10.18653/v1/2023.findings-acl.847 <pub-id pub-id-type="doi">10.18653/v1/2023.findings-acl.847</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/2023.findings-acl.847">https://doi.org/10.18653/v1/2023.findings-acl.847</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Perez, E.</string-name>
              <string-name>Ringer, S.</string-name>
              <string-name>Lukosiute, K.</string-name>
              <string-name>Nguyen, K.</string-name>
              <string-name>Chen, E.</string-name>
              <string-name>Heiner, S.</string-name>
              <string-name>Toronto, J</string-name>
            </person-group>
            <year>2023</year>
            <article-title>Discovering Language Model Behaviors with Model-Written Evaluations</article-title>
            <source>Findings of the Association for Computational Linguistics: ACL 2023</source>
            <pub-id pub-id-type="doi">10.18653/v1/2023.findings-acl.847</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B5">
        <label>5.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Hubinger, E., Van Merwijk, C., Mikulik, V., Skalse, J. and Garrabrant, S. (2019) Risks from Learned Optimization in Advanced Machine Learning Systems.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Hubinger, E.</string-name>
              <string-name>Merwijk, C.</string-name>
              <string-name>Mikulik, V.</string-name>
              <string-name>Skalse, J.</string-name>
              <string-name>Garrabrant, S.</string-name>
            </person-group>
            <year>2019</year>
            <article-title>Risks from Learned Optimization in Advanced Machine Learning Systems</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B6">
        <label>6.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Hubinger, E., Denison, C., Mu, J., Lambert, M., <italic>et al</italic>. (2024) Sleeper Agents: Training Deceptive LLMs That Persist through Safety Training.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Hubinger, E.</string-name>
              <string-name>Denison, C.</string-name>
              <string-name>Mu, J.</string-name>
              <string-name>Lambert, M.</string-name>
            </person-group>
            <year>2024</year>
            <article-title>Sleeper Agents: Training Deceptive LLMs That Persist through Safety Training</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B7">
        <label>7.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Ganter, B. and Wille, R. (1999) Formal Concept Analysis—Mathematical Foundations. Springer.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Ganter, B.</string-name>
              <string-name>Wille, R.</string-name>
            </person-group>
            <year>1999</year>
            <article-title>Formal Concept Analysis—Mathematical Foundations</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B8">
        <label>8.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Wille, R. (1992) Concept Lattices and Conceptual Knowledge Systems. <italic>Computers &amp; Mathematics with Applications</italic>, 23, 493-515. https://doi.org/10.1016/0898-1221(92)90120-7 <pub-id pub-id-type="doi">10.1016/0898-1221(92)90120-7</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/0898-1221(92)90120-7">https://doi.org/10.1016/0898-1221(92)90120-7</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Wille, R.</string-name>
            </person-group>
            <year>1992</year>
            <article-title>Concept Lattices and Conceptual Knowledge Systems</article-title>
            <source>Computers &amp; Mathematics with Applications</source>
            <volume>1221</volume>
            <issue>92</issue>
            <pub-id pub-id-type="doi">10.1016/0898-1221(92)90120-7</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B9">
        <label>9.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Kouhoué, A.W., Bonavero, Y., Bouétou Bouétou, T. and Huchard, M. (2021) Exploring Variability of Visual Accessibility Options in Operating Systems. <italic>Future Internet</italic>, 13, Article 230. https://doi.org/10.3390/fi13090230 <pub-id pub-id-type="doi">10.3390/fi13090230</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/fi13090230">https://doi.org/10.3390/fi13090230</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Bonavero, Y.</string-name>
              <string-name>Huchard, M.</string-name>
            </person-group>
            <year>2021</year>
            <article-title>Exploring Variability of Visual Accessibility Options in Operating Systems</article-title>
            <source>Future Internet</source>
            <volume>13</volume>
            <elocation-id>230</elocation-id>
            <pub-id pub-id-type="doi">10.3390/fi13090230</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B10">
        <label>10.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Bazin, A., Galasso, J. and Kahn, G. (2024) Polyadic Relational Concept Analysis. <italic>International Journal of Approximate Reasoning</italic>, 164, Article 109067. https://doi.org/10.1016/j.ijar.2023.109067 <pub-id pub-id-type="doi">10.1016/j.ijar.2023.109067</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ijar.2023.109067">https://doi.org/10.1016/j.ijar.2023.109067</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Bazin, A.</string-name>
              <string-name>Galasso, J.</string-name>
              <string-name>Kahn, G.</string-name>
            </person-group>
            <year>2024</year>
            <article-title>Polyadic Relational Concept Analysis</article-title>
            <source>International Journal of Approximate Reasoning</source>
            <volume>164</volume>
            <elocation-id>109067</elocation-id>
            <pub-id pub-id-type="doi">10.1016/j.ijar.2023.109067</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B11">
        <label>11.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Carbonnel, J., Huchard, M. and Nebut, C. (2019) Modelling Equivalence Classes of Feature Models with Concept Lattices to Assist Their Extraction from Product Descriptions. <italic>Journal of Systems and Software</italic>, 152, 1-23. https://doi.org/10.1016/j.jss.2019.02.027 <pub-id pub-id-type="doi">10.1016/j.jss.2019.02.027</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.jss.2019.02.027">https://doi.org/10.1016/j.jss.2019.02.027</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Carbonnel, J.</string-name>
              <string-name>Huchard, M.</string-name>
              <string-name>Nebut, C.</string-name>
            </person-group>
            <year>2019</year>
            <article-title>Modelling Equivalence Classes of Feature Models with Concept Lattices to Assist Their Extraction from Product Descriptions</article-title>
            <source>Journal of Systems and Software</source>
            <volume>152</volume>
            <pub-id pub-id-type="doi">10.1016/j.jss.2019.02.027</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B12">
        <label>12.</label>
        <citation-alternatives>
          <mixed-citation publication-type="report">Owoyeye, O. (2025) Automated, Reproducible Pipeline for LLM Vulnerability Discovery: Probe Design, JSON Findings, and Statistical Quality Controls: Case Study of GPT-OSS-20B Vulnerabilities Handsonlabs Software Academy Technical Report. https://github.com/tobimichigan/Probe-Design-Case-Study-Of-Gpt-Oss-20b-Vulnerabilities/tree/main</mixed-citation>
          <element-citation publication-type="report">
            <person-group person-group-type="author">
              <string-name>Owoyeye, O.</string-name>
              <string-name>Automated, R</string-name>
              <string-name>Design, J</string-name>
            </person-group>
            <year>2025</year>
            <article-title>Automated, Reproducible Pipeline for LLM Vulnerability Discovery: Probe Design, JSON Findings, and Statistical Quality Controls: Case Study of GPT-OSS-20B Vulnerabilities Handsonlabs Software Academy Technical Report</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B13">
        <label>13.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Zhou, C.T., Liu, P.F., Xu, P.X., <italic>et al</italic>. (2024) LIMA: Less Is More for Alignment. 2024 <italic>Advances in Neural Information Processing Systems</italic>( <italic>NeurIPS</italic>), Vancouver, 10-15 December 2024, 55006-55021.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Zhou, C.T.</string-name>
              <string-name>Liu, P.F.</string-name>
              <string-name>Xu, P.X.</string-name>
            </person-group>
            <year>2024</year>
            <article-title>LIMA: Less Is More for Alignment</article-title>
            <source>2024 Advances in Neural Information Processing Systems (NeurIPS)</source>
            <volume>10</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B14">
        <label>14.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Perez, E., Huang, S., Song, F., <italic>et al</italic>. (2024) Red Teaming Language Models with Language Models. <italic>Journal of Machine Learni</italic><italic>ng Researc</italic><italic>h</italic>, 25, 1-48.</mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Perez, E.</string-name>
              <string-name>Huang, S.</string-name>
              <string-name>Song, F.</string-name>
            </person-group>
            <year>2024</year>
            <article-title>Red Teaming Language Models with Language Models</article-title>
            <source>Journal of Machine Learning Research</source>
            <volume>25</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B15">
        <label>15.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Zheng, Y., Chang, C.H., Huang, S.H., Chen, P.Y. and Picek, S. (2024) An Overview of Trustworthy AI: Advances in IP Protection, Privacy-Preserving Federated Learning, Security Verification, and GAI Safety Alignment. <italic>IEEE Journal on Emerging and Selected Topics in Circuits and Systems</italic>, 14, 582-607. https://doi.org/10.1109/JETCAS.2024.3477348 <pub-id pub-id-type="doi">10.1109/JETCAS.2024.3477348</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/JETCAS.2024.3477348">https://doi.org/10.1109/JETCAS.2024.3477348</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Zheng, Y.</string-name>
              <string-name>Chang, C.H.</string-name>
              <string-name>Huang, S.H.</string-name>
              <string-name>Chen, P.Y.</string-name>
              <string-name>Picek, S.</string-name>
              <string-name>Protection, P</string-name>
              <string-name>Learning, S</string-name>
            </person-group>
            <year>2024</year>
            <article-title>An Overview of Trustworthy AI: Advances in IP Protection, Privacy-Preserving Federated Learning, Security Verification, and GAI Safety Alignment</article-title>
            <source>IEEE Journal on Emerging and Selected Topics in Circuits and Systems</source>
            <volume>14</volume>
            <pub-id pub-id-type="doi">10.1109/JETCAS.2024.3477348</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B16">
        <label>16.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Gutierrez, A., Huchard, M., Martin, P. and Zhang, H. (2025) Empowering Relational Concept Analysis Using Large Language Model Knowledge Delivery. In: <italic>Lecture</italic><italic>Notes</italic><italic>in</italic><italic>Computer</italic><italic>Science</italic>, Springer, 124-139. https://doi.org/10.1007/978-3-032-03364-2_8 <pub-id pub-id-type="doi">10.1007/978-3-032-03364-2_8</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-032-03364-2_8">https://doi.org/10.1007/978-3-032-03364-2_8</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Gutierrez, A.</string-name>
              <string-name>Huchard, M.</string-name>
              <string-name>Martin, P.</string-name>
              <string-name>Zhang, H.</string-name>
              <string-name>Science, S</string-name>
            </person-group>
            <year>2025</year>
            <article-title>Empowering Relational Concept Analysis Using Large Language Model Knowledge Delivery</article-title>
            <source>In: Lecture Notes in Computer Science</source>
            <volume>124</volume>
            <pub-id pub-id-type="doi">10.1007/978-3-032-03364-2_8</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B17">
        <label>17.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Cocks, V., Diop, A., Flores, O.J., Mendoza, Y., Huchard, M. and Zhang, H.Y. (2025) LLMs Do It All: A Reality Check with Formal Concepts.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Cocks, V.</string-name>
              <string-name>Diop, A.</string-name>
              <string-name>Flores, O.J.</string-name>
              <string-name>Mendoza, Y.</string-name>
              <string-name>Huchard, M.</string-name>
              <string-name>Zhang, H.Y.</string-name>
            </person-group>
            <year>2025</year>
            <article-title>LLMs Do It All: A Reality Check with Formal Concepts</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B18">
        <label>18.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Bazin, A., Gutierrez, A., Huchard, M., Martin, P., <italic>et al</italic>. (2025) Variability-Driven User-Story Generation Using LLM and Triadic Concept Analysis.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Bazin, A.</string-name>
              <string-name>Gutierrez, A.</string-name>
              <string-name>Huchard, M.</string-name>
              <string-name>Martin, P.</string-name>
            </person-group>
            <year>2025</year>
            <article-title>Variability-Driven User-Story Generation Using LLM and Triadic Concept Analysis</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B19">
        <label>19.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Gutierrez, A., Huchard, M., Mondedji, A.D., Sy, D.S., Silvie, P.J. and Martin, P. (2026) Combining Symbolic and Generative AI to Explore Knowledge Base and Control Cabbage Pests in West-Africa. <italic>CORDIALL</italic> 2026 <italic>Digital Agriculture Conference</italic>, Montpellier, 13-17 April 2026, 113.</mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Gutierrez, A.</string-name>
              <string-name>Huchard, M.</string-name>
              <string-name>Mondedji, A.D.</string-name>
              <string-name>Sy, D.S.</string-name>
              <string-name>Silvie, P.J.</string-name>
              <string-name>Martin, P.</string-name>
              <string-name>Conference, M</string-name>
            </person-group>
            <year>2026</year>
            <article-title>Combining Symbolic and Generative AI to Explore Knowledge Base and Control Cabbage Pests in West-Africa</article-title>
            <source>CORDIALL 2026 Digital Agriculture Conference</source>
            <volume>13</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B20">
        <label>20.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Gampel, A. (2026) Streamlining Cybersecurity Risk Assessment for Industrial Control and Automation Systems: Leveraging NIST’s Risk Management Framework (RMF) Implemented Using Model-Based System’s Engineering (MBSE). The George Washington University.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Gampel, A.</string-name>
            </person-group>
            <year>2026</year>
            <article-title>Streamlining Cybersecurity Risk Assessment for Industrial Control and Automation Systems: Leveraging NIST’s Risk Management Framework (RMF) Implemented Using Model-Based System’s Engineering (MBSE)</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
    </ref-list>
  </back>
</article>