keyboard_arrow_up
Accepted Papers
Vector Embeddings for Images Beyond Neural Networks: An Exhaustive Study on Compact Composite Descriptorss

Arpad Kiss, GreenEyes Artificial Intelligence Services, LLC, Lewes, Delaware, USA

ABSTRACT

This research report provides a comprehensive analysis of Compact Composite Descriptors (CCDs) as a highly ef icient alternative to deep learning embeddings for Content-Based Image Retrieval (CBIR) in resource-constrained environments. While Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) of er superior semantic performance, their computational overhead and storage requirements—often exceeding 8KB per image—limit their applicability in Edge AI and IoT scenarios. In contrast, engineered descriptors such as the Color and Edge Directivity Descriptor (CEDD), Fuzzy Color and Texture Histogram (FCTH), and Joint Composite Descriptor (JCD) utilize fuzzy inference systems to encode visual features into ultra-compact vectors ranging from 54 to 72 bytes. The study explores the algorithmic foundations of these descriptors, their implementation within the LIRE (Lucene Image Retrieval) framework, and benchmarks demonstrating their competitive retrieval accuracy against MPEG-7 standards. Finally, the report highlights the strategic utility of CCDs for privacy-preserving, low-bandwidth visual search on edge devices, proposing hybrid architectures that leverage the speed of fuzzy composites with the semantic power of neural re-ranking.

KEYWORDS

Computer Vision, Cloud Computing, Embedded Systems, Content-based Image Retrieval Systems


Cloud Based Decision Support systems for Analysing Student trends in Educational Institutions

Awatef Balobaid and R.Y. Aburasain, Jazan University, KSA

ABSTRACT

This research suggests a new technique to detect and categorize student performance that will assistschools in improving outcomes. A regression-based technique estimates student performance, and aclassification model classifies students by performance. It begins with a regression model that predictsstudent performance. It then utilizes gradient descent to refine the model over and over again to generatebetter predictions. The model is then cross-validated and retrained on the complete set of data to make itmore accurate and helpful in different circumstances. The system organizes students by predictedperformance using the regression model. To increase classification accuracy, further optimization isutilized to determine the appropriate option limit for splitting performance groups. We assess themethod's efficiency in terms of accuracy, response time, scalability, and resource utilization. The findingsdemonstrate the new procedure is superior to the old ones. This strategy is robust, versatile, and cost-effective for educational organizations since it can generate correct predictions 95% of the time, reactmore rapidly, utilize resources economically, and be employed on a big scale. It helps instructors knowhow their pupils are doing so they may intervene early and make better decisions to support them. Data-based analysis can enhance educational results by utilizing the system's power and ability to adapt to newdata

KEYWORDS

Adaptability, Classification, Data-driven, Educational institutions, Optimization, Performance prediction,Regression, Resource utilization, Scalability, Student outcomes


Bridging Misuse Case Modelling and MITRE ATT&CK: A Unified Framework for Threat-Informed Design

Jean-Marie Kabasele Tenday, University ND Kasayi(UKA), Belgium

ABSTRACT

Traditional threat modelling techniques often focus on theoretical or system-specific threats without grounding them in empirical adversarial behaviour. Conversely, frameworks such as MITRE ATT&CK provide rich, intelligence-based taxonomies of real-world attacker tactics, techniques, and procedures (TTPs), but are rarely integrated into early software design phases. This paper proposes a methodology for linking misuse cases—UML-based representations of malicious system interactions—with MITRE ATT&CK techniques, enabling traceability between system-level threats and empirically observed attacks. The proposed framework enhances the relevance, completeness, and operational value of misuse case–based threat modelling. A structured mapping template and example implementation demonstrate how software architects can enrich their security design processes using ATT&CK-informed misuse cases.

KEYWORDS

Misuse case, Mitre ATT&CK, Threat Analysis, Threat Modelling, Cybersecurity, Secure Design.


Vulnerability Analysis of Containerized Web Applications using SAST and DAST Tools

Burak Enes Beygog1, Ahmet Burak Can 2, 1Aselsan Inc., Ankara, Türkiye, 2 Hacettepe University, Ankara, Türkiye

ABSTRACT

While containerization has significantly simplified web application deployment, it has simultaneously introduced security blind spots that traditional testing methodologies often fail to address. This study examines the effectiveness of Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), and container scanning tools in identifying vulnerabilities within containerized environments through empirical testing of five open-source tools against three vulnerable applications (DVWA, Juice Shop, and VulnerableApp). Results demonstrate that reliance on any single tool presents substantial risk, with individual tools failing to detect up to 91% of existing vulnerabilities, while each tool category exhibited distinct limitations. Trivy uniquely identified critical infrastructure and supply chain risks, whereas DAST tools including Nikto and OWASP ZAP proved essential for detecting runtime misconfigurations. Notably, authenticated scanning emerged as particularly impactful, enhancing vulnerability detection rates by over 1,400%, thereby underscoring the necessity of implementing a Defense-in-Depth security strategy. Through strategic orchestration of Trivy for infrastructure assessment, authenticated DAST for runtime analysis, and SonarQube for static code analysis, security teams can substantially reduce their vulnerability miss rate to approximately 32%, achieving comprehensive coverage across code, infrastructure, and runtime configuration layers.

KEYWORDS

Container Security, DevSecOps, Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), Software Composition Analysis (SCA), Supply Chain Security


Evaluating Sentiment Models for Cybershield Abusive Language Detection System

Binisa Giri1, Hashmath Fathima, Kelechi Nwachukwu 2 and Kofi Nyarko, Department of Electrical and Computer Engineering, Morgan State University, Baltimore, USA

ABSTRACT

Cyber Shield is an automated graph augmented abusive language and interaction detection system designed to identify harmful content including toxic interaction, hate speech, and general negative sentiment that is prevalent on social media platforms. As part of integrating a robust sentiment component into the system, we evaluated four widely used sentiment analysis models: BERT, RoBERTa, VADER, and TextBlob based on their complementary strengths and methodological diversity. BERT and RoBERTa represent string transformers architectures capable of capturing contextual meaning in noisy social media texts. VADER provides a lexicon based model optimized for informal online communication, of ering a lightweight alternative to transformers. TextBlob is a traditional NLP baseline to benchmark improvements of ered by more contemporary models. Together, this combination allows for a comprehensive comparison across model families, ensuring evidence based model selection for the CyberShield project. These models were evaluated on a Kaggle dataset containing social media comments labeled with three sentiment classes (negative, positive, neutral) serving as the ground truth. Each model’s performance was measured using confusion matrices, accuracy, macro F1, weighted F1, and per class F1 scores. Our findings show that with an initial sample of 3000 texts, classical lexicon based models (i.e. VADER) and the traditional NLP baseline model (i.e., TextBlob), significantly outperformed transformer based models. TextBlob achieved the strongest performance results in this phase, underscoring the challenges of applying general pre-trained transformers to real world sentiment classification without domain specific fine tuning. However, after expanding the dataset to 18,318 samples per sentiment class and rerunning the evaluation with the updated RoBERTa sentiment model, the performance of trend shifted. The updated RoBERTa model demonstrated substantial improvement and outperformed the earlier transformer results.

KEYWORDS

Abusive Language Detection, Sentiment Analysis, Transformer Models, Lexicon-based models, Social Media Moderation, Performance Metrics


Split-brain Rag: Why Large Language Models are Not Enough for Scientific Question Answering

Jodi Moselle Alcantara1 and Armielyn Obinguar2, 1Independent Researcher, Pampanga, Philippines, 2 Independent Researcher, Makati, Philippines

ABSTRACT

Large Language Models (LLMs) show promise for information retrieval but face trade-offs between reasoning depth, latency, and cost in scientific question answering (QA). This work evaluates monolithic LLM deployments in Ricerca Paperchat, a Retrieval-Augmented Generation (RAG) system for academic inquiry. We analyze seven models, including Claude Sonnet 4.5, GPT-4o, Gemini Flash, and hosted variants of Qwen and Llama, across accuracy, hallucination resistance, formatting, long-context stability, consistency, and cost. Results show that no single model meets real-time scientific QA requirements: high-reasoning models are slow, while faster models often fail safety checks. We conclude that monolithic LLM architectures are insufficient and propose Split-Brain RAG, a complexity-aware routing approach that reduces latency and cost while maintaining scientific accuracy.

KEYWORDS

Large Language Models, Retrieval-Augmented generation


Semantic Topology Reasoning Architecture (STRA):From Parameter-Centric Models to StructureCentric Reasoning

Marcelo Emanuel Paradela Teixeira, Independent Researcher, France

ABSTRACT

Large language models fuse knowledge and reasoning into billions of inscrutable parameters, trading interpretability for performance. We propose Semantic Topology Reasoning Architecture (STRA), which cleanly separates: (1) knowledge as explicit, inspectable semantic topology; (2) reasoning as metaoperations by smaller models (1-7B parameters) trained on topology navigation; (3) language as output interface, not cognitive substrate. This separation enables transparency (visible reasoning paths), efficiency (targeted computation), correctability (edit knowledge without retraining), and genuine crossdomain reasoning through semantic similarity. STRA integrates five primitives: Activation Arrays (working memory), Causal Signatures (cross-domain analogy), Selection Pressure (reasoning stability), Transform Learning (procedural compression), and Semantic Abacus (skill acquisition). These form a complete architecture for transparent, evolvable reasoning that operates on concepts, not tokens.

KEYWORDS

Semantic reasoning, transparent AI, knowledge representation, activation dynamics, explainable AI

Privacy-By-Default: An Industry-Aware Framework for Automated Data Retention at Scale

Sandhya Vinjam , Principal Software Engineer, Texas, USA

ABSTRACT

Data privacy regulations such as GDPR, CCPA, and LGPD impose strict requirements on organizations to automatically delete personal identifiable information (PII) after specified retention periods. However, implementing compliant data retention at scale presents significant architectural and operational challenges, particularly for platforms processing millions of records daily across distributed microservices. This paper presents Privacy-by-Default, an industry-aware framework that automates data retention enforcement without requiring per-merchant configuration. Our framework processes 50,000 daily redaction requests acrosse 5 million user records spanning 12 microservices, achieving 99.7% deletion success rates with sub-3-hour latency. Through industry-specific retention policies and multi-service orchestration, we demonstrate how privacy compliance can be achieved by design rather than by configuration. Evaluation across pharmaceutical, healthcare, retail, and restaurant sectors shows our framework reduces compliance violations by 94%, eliminates manual intervention overhead, and provides audit-ready verification. We estimate our deployment has avoided approximately $4 million in potential regulatory fines while enabling market expansion into regulated jurisdictions.

KEYWORDS

Privacy engineering; GDPR compliance; automated data retention; privacy-by-design; PII redaction; distributed systems; microservices architecture


Economic Impact of Security Failures in Cloud Infrastructure

Sandhya Vinjam , Principal Software Engineer, Texas, USA

ABSTRACT

Security failures in cloud infrastructure result in significant economic losses that extend far beyond immediate breach costs. This paper presents a comprehensive analysis of the economic impact of security failures across cloud service providers, examining direct costs (incident response, system recovery, regulatory fines) and indirect costs (customer churn, reputational damage, market valuation impact). Through analysis of 127 publicly disclosed security incidents affecting cloud infrastructure providers between 2019-2024, we quantify the total economic impact at $47.3B, with individual incidents ranging from $2.1M to $4.2B. We develop a predictive model correlating security architecture decisions with economic risk, demonstrating that proactive security investments of $1M-5M can prevent potential losses of $50M-500M. Our findings show that the mean time to detect (MTTD) security incidents has the strongest correlation with total economic impact (r=0.82, p<0.001), suggesting that investment in detection capabilities provides the highest ROI for mitigating financial risk. We present evidence that organizations implementing comprehensive security frameworks achieve 73% lower total cost of incidents and 89% faster recovery times. This work provides quantitative evidence for prioritizing security investments in cloud infrastructure and establishes benchmarks for measuring the economic effectiveness of security programs.

KEYWORDS

Privacy engineering; Economics; Security; Mean time to detect; Cloud Infrastructure.


Real-time Smile Synchronization as a Mechanism for Emotional Contagion in Public Interactive Displays

He-lin Luo and Meng-fan Huang, Graduate Institute of Animation and Film Art, Tainan National University of the Arts, Tainan City, Taiwan

ABSTRACT

Emotional contagion refers to the psychological and behavioral phenomenon in which individuals unconsciously mimic the facial expressions, vocal patterns, postures, and movements of others during social interactions, resulting in corresponding changes in their own emotional states. With the rapid development of digital media and networked communication platforms, emotional transmission is no longer limited to face-to-face interaction, but increasingly mediated through multimodal digital signals such as symbolic icons, animated feedback, visual imagery, and auditory cues. This transformation has positioned emotional contagion as a critical research topic in the fields of Human-Computer Interaction (HCI) and Affective Computing. This study focuses on the transmission of positive emotions, specifically investigating the contagion effect of happiness through an interactive installation titled Quartic Smile. The system was designed to construct a real-time emotional feedback environment in which users can perceive and respond to the emotional expressions of others within a shared interactive space. By integrating real-time facial expression recognition, the system captures smiling behaviors as emotional triggers and translates them into visualized interactive responses, thereby facilitating emotional resonance and collective engagement among participants. To quantitatively evaluate the effectiveness of emotional contagion, two core metrics were defined in this study. The first is the contagion level, which is calculated based on the frequency of smiles and reflects the intensity and distribution of emotional transmission among users. The second is the contagion speed, measured by the cumulative duration of smiling behaviors between participants, representing the temporal dynamics and responsiveness of emotional propagation. The experimental results indicate that Quartic Smile effectively enhances positive emotional interaction and demonstrates the potential of real-time interactive systems to shape collective emotional atmospheres and social engagement patterns.

KEYWORDS

Smile Detection, Emotion Recognition, Real-Time System, Public Interactive Installation.


Evaluating Chunking Strategies for Retrieval-augmented Generation in Oil and Gas Enterprise Documents

Samuel Taiwo and Mohd Amaluddin Yusoff Digital and Innovation Department, Nigeria LNG Limited, Port-Harcourt, Nigeria

ABSTRACT

Retrieval-Augmented Generation (RAG) has emerged as a framework to address the constraints of Large Language Models (LLMs), yet its effectiveness fundamentally hinges on document chunking—an often-overlooked determinant of its quality. This paper presents an empirical study quantifying performance differences across four chunking strategies: fixed-size sliding window, recursive, breakpoint-based semantic, and structure-aware. We evaluated these methods using a proprietary corpus of oil and gas enterprise documents, including text-heavy manuals, table-heavy specifications, and piping and instrumentation diagrams (P&IDs). Our findings show that structure-aware chunking yields higher overall retrieval effectiveness, particularly in top-K metrics, and incurs significantly lower computational costs than semantic or baseline strategies. Crucially, all four methods demonstrated limited effectiveness on P&IDs, underscoring a core limitation of purely text-based RAG within visually and spatially encoded documents. We conclude that while explicit structure preservation is essential for specialised domains, future work must integrate multimodal models to overcome current limitations.

KEYWORDS

RAG, AI, Oil and Gas, Information Retrieval


Prolonging Anti-Deepfake Signatures Lifetime with Blockchain-Based Timestamps

Sohaib Saleem and Pericle Perazzo, University of Pisa, Italy

ABSTRACT

As AI-generated synthetic media, such as deepfake images, proliferate, verifying the authenticity of digital images has become a significant challenge. Traditional digital signature techniques become invalid if images are cropped; therefore, special croppable signatures have been proposed in the literature. However, both traditional and croppable signatures remain valid only as long as their associated public key certificate remains valid. This could be problematic for authenticated images, as they often circulate over the Internet for long periods of time, beyond the expiration of their public key certificates. Re-signing each image with a new key requires redistributing all affected images, and this may be impractical for large-scale systems. To address this issue, we propose an image authentication system with croppability and post-expiration validity features, using BLS (Boneh–Lynn–Shacham) short signatures, the Ethereum blockchain as a decentralized trusted timestamping service, and IPFS (InterPlanetary File System) as a decentralized storage solution. Additionally, we employ two methods: a baseline method, in which the web server hosting the images does not pay any transaction fees, and an optimized method, which produces very little traffic on the web browser. Experimental evaluations are conducted in Pakistan and Italy under real Wi-Fi and simulated 4G cellular connections using Linux traffic control (tc) to demonstrate the system’s performance. Results showed that, in the baseline method, the network traffic overhead and communication delay increase linearly with the image size. Meanwhile, the optimized method achieves constant-time performance for retrieval and verification.

KEYWORDS

Image authentication, BLS signatures, blockchain, decentralized timestamping, IPFS.


AI-based Classification of the Meat Freshness using Cantilever Sensor Data

Sebastian Hauschild, Jan-Philipp Schreiter and Horst Hellbruck, Luebeck University of Applied Sciences, Center of Excellence CoSA, Germany

ABSTRACT

A novel approach for determining the freshness of fish and meat involves the use of cantileversensors, which analyse the concentration of cadaverine on the surface. The cantilever sensor isexcited with a voltage sweep around its resonance frequency and the frequency shift due to depositson the sensor is measured. In this work, we present a draft of a distributed system and compareAI-based analysis of the stored cantilever sensor data with raw sweep data without preprocessing.We defined a meat quality index (mqi) range for the measurements, which depends on the frequency shiftbetween a reference and cadaverine measurement. We investigated, that the best practice to predictthe mqi value is to use classical machine learning models such as Random Forest, LightGBM,XGBoost where Random Forest performs best with anval. / test accuracy of up to 72.01 % / 71.67%, precisionof 72.37 % / 72.53%, recall of72.01 % / 71.67 % and F1-Score of72.06 % / 71.72 %.

KEYWORDS

Cantilever, Machine Learning, Database, Distributed Systems, Sustainability


AI-Driven Climate Adaptation Models for Predictive Crop Yield Optimization

Venkateshwara Reddy Mudiyala1 and Sai Manvith Reddy Buchi Reddy2, 1Department of Computer Science, New England College, Henniker, New Hampshire 2 Department of Computer Science, University of Bridgeport, Bridgeport, Connecticut

ABSTRACT

Agriculture is increasingly threatened by climate variability and change, necessitating innovative solutions for sustainable crop yield optimization. This paper presents an advanced AI-driven framework for climate adaptation that predicts crop yields by integrating novel machine learning methodologies and applied sciences. The proposed system leverages hybrid machine learning models, incorporating meteorological, soil, and satellite data to deliver precise and actionable insights for farmers and agricultural stakeholders. Our principled solution focuses on novel data fusion architecture, robust feature engineering, and adaptive modeling techniques, demonstrating significant advancements over conventional methods. An in-depth evaluation reveals the framework's ability to enhance decision-making and mitigate the adverse effects of climatic uncertainties.

Spatio-temporal Prediction of Crimes using Predictive Justice Algorithms

Fahil Abdulbasit A. Abdulkareem, Legal Administrative Department, Duhok Polytechnic University, Duhok, Iraqi Kurdistan Region

ABSTRACT

A statistical/machine learning technique known as Risk Terrain Modelling (RTM) is currently being used as a software solution to diagnose the socio-environmental conditions that lead to crime in a specific area's geography (the Study Area) by geospatially and temporally analysing crimes by linking them to hotspots and analysing them into big data (criminal data). As a result, new predicting patterns of risks in the geographical area under survey (the Study Area) appear in the future, all for the sake of a prompt and efficient response by the Predictive Police. Prioritising the use of precautionary resources is necessary in two ways: first, to prevent crime and lessen potential dangers in the event that one does occur; second, to determine what must be done as soon as possible as a preventive measure to control the crime with the least amount of harm to the police forces. Leslie W. Kennedy and Joel M. Caplan founded the Risk Terrain Modelling at Rutgers University, and it has been systematically investing in the field of criminal investigation for over ten years. Currently, the model is being tested in more than 45 nations worldwide.

KEYWORDS

Predictive Policing, Near-Repeat Phenomenon, Crime Prediction, Risk Terrain Modelling, Agent-Based Modelling.


Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Zhiyuan Cheng1 Longying Lai2, Yue Liu3, Kai Cheng4 and Xiaoxi Qi5 1School of Engineering, Stanford University, Stanford, CA, USA, 2Simon Business School, University of Rochester, Rochester, NY, USA, 3Accounting & Information Systems, Rutgers University, Newark, NJ, USA,4 Institute for Social and Economic Research and Policy, Columbia University, New York, NY, USA,5 Department of Economics, Northeastern University, Boston, MA, USA

ABSTRACT

Financial analysts face significant challenges extracting information from lengthy 10-K reports, which often exceed 100 pages. This paper presents a Retrieval-Augmented Generation (RAG) system designed to answer questions about S&P 500 financial reports and evaluates the impact of neural reranking on system performance. Our pipeline employs hybrid search combining full-text and semantic retrieval, followed by an optional reranking stage using a cross-encoder model. We conduct systematic evaluation using the FinDER benchmark dataset, comprising 1,500 queries across five experimental groups. Results demonstrate that reranking significantly improves answer quality, achieving 49.0 percent correctness compared to 33.5 percent without reranking (15.5 percentage point improvement). The error rate decreases from 35.3 percent to 22.5 percent (12.8percentage point reduction), and average scores improve from 4.95 to 6.02 (21.6 percent relative improvement). Our findings emphasize the critical role of reranking in financial RAG systems and demonstrate performance improvements over baseline methods through modern language models and refined retrieval strategies.

KEYWORDS

Retrieval-Augmented Generation, Financial Document Analysis, Question Answering, Neural Reranking, 10-K Reports.


menu
Reach Us

emailsigml@ccseit2026.org


emailsigmlconfe@yahoo.com

close