AGENTICS 2025 Abstracts

Area 1 - Risks and Ethical Concerns

Full Papers

Paper Nr:	42
Title:	LLM-based Multi-Agent Systems: Frameworks, Evaluation, Open Challenges, and Research Frontiers
Authors:	Soharab Hossain Shaikh
Abstract:	Large Language Models (LLMs) have propelled the emergence of sophisticated Multi-Agent Systems (MAS) that leverage language-driven reasoning, collaboration, and autonomous decision-making. This paper presents a comprehensive review of state-of-the-art LLM-based frameworks for building MAS - including AutoGen, CrewAI, CAMEL, ChatDev, LangGraph, and Google DeepMind’s Agent Development Kit (ADK)- highlighting their architectural designs, agent coordination mechanisms, and operational strengths. We critically analyze fundamental challenges such as dynamic task decomposition, persistent multi-agent memory, communication efficiency, emergent collective behaviours, explainability, trust, and the lack of standardized evaluation benchmarks. Drawing from recent advances, we identify promising research frontiers encompassing meta-learning for adaptive task assignment, multimodal grounding to bridge language and perception, swarm-inspired emergent phenomena, and robust memory architectures supporting long-term agent continuity. By synthesizing these insights, this work offers a diagnostic and prescriptive roadmap to enhance the scalability, interpretability, and resilience of LLM-powered multi-agent systems, thereby accelerating the development of robust, collaborative AI agents for real-world deployment.

Paper Nr:	86
Title:	LLM-Based Risk Scenario Generation and Mitigation for AI Systems: A Case Study Approach
Authors:	Arisa Morozumi and Hisashi Hayashi
Abstract:	As Artificial Intelligence (AI) systems become increasingly integrated into critical domains, conventional risk management methodologies often prove inadequate for addressing their unique and complex challenges, particularly the emergence of novel, unforeseen risks. To address this gap, this paper introduces the LLM-Based AI Risk Management Framework, a structured four-step process that systematically leverages Large Language Models (LLMs) to enhance risk identification and analysis. The framework's efficacy is demonstrated through a detailed case study of an AI-powered matching system and an empirical validation of its core prompt engineering techniques. The results show that this approach enables the generation of comprehensive risk scenarios, including critical compliance and ethical issues initially overlooked by human experts, thereby serving as an objective counter-perspective to organizational biases. The study reveals that the framework's success hinges on a sophisticated human-in-the-loop model where human experts provide strategic direction, not just passive validation. A key finding is that the quality of LLM outputs is dramatically improved by framing requests as concrete 'incident scenarios' instead of abstract 'risks'. This research contributes an empirically-grounded methodology for integrating LLMs into AI governance, demonstrating that the strategic partnership between human expertise and LLM capabilities can foster a more robust, responsible, and safe approach to managing AI systems.

Short Papers

Paper Nr:	24
Title:	Towards SAFE AI Agentic System
Authors:	Golnoosh Babaei, Paolo Giudici, Alessandro Piergallini and Rasha Zieni
Abstract:	The growing complexity of decision-making systems has led to the need for intelligent, adaptive, and transparent AI-driven solutions. This paper proposes SAFE multi-agent AI systems based on generative artificial intelligence which are instructed with target performance metrics that not only address accuracy but also robustness and explainability, thus making them Sustainable, Accurate, Fair and Explainable. Each agent operates as an employee who is responsible for making decisions, based on the comparison between performance metrics and given risk thresholds. Through our experiments and simulations, we demonstrate that SAFE agentic AI models can be effectively implemented.

Paper Nr:	27
Title:	HADA: Human-AI Agent Decision Alignment Architecture
Authors:	Tapio Pitkäranta and Leena Pitkäranta
Abstract:	Problem & Motivation. The generative AI boom is spawning rapid deployment of diverse LLM software agents. New standards such as the Model Context Protocol (MCP) and Agent-to-Agent (A2A) protocols let agents share data and tasks, yet organizations still lack a rigorous way to keep those agents — and legacy algorithms — aligned with organizational targets and values. Objectives of the Solution. We aim to deliver a software reference architecture that (i) provides every stakeholder natural-language interaction across planning horizons with software agents and AI algorithmic logic, (ii) provides a multidimensional way for aligning stakeholder targets and values with algorithms and agents, (iii) provides an example for jointly modelling AI algorithms, software agents, and LLMs, (iv) provides a way for stakeholder interaction and alignment across time scales, (v) scales to thousands of algorithms and agents while remaining auditable, (vi) remains framework-agnostic, allowing the use of any underlying LLM, agent library, or orchestration stack without requiring redesign. Design & Development. Guided by the Design-Science Research Methodology (DSRM), we engineered HADA (Human-Algorithm Decision Alignment)—a protocol-agnostic, multi-agent architecture that layers role-specific interaction agents over both Large-Language Models and legacy decision algorithms. Our reference implementation containerises a production credit-scoring model, getLoanDecision, and exposes it through stakeholder agents (business manager, data scientist, auditor, ethics lead and customer), enabling each role to steer, audit and contest every decision via natural-language dialogue. The resulting constructs, design principles and justificatory knowledge are synthesised into a midrange design theory that generalises beyond the banking pilot. Demonstration. HADA is instantiated on a cloud-native stack—Docker, Kubernetes and Python—and embedded in a retail-bank sandbox. Five scripted scenarios show how business targets, algorithmic parameters, decision explanations and ethics triggers propagate end-to-end through the HADA architecture. Evaluation. Walkthrough observation and log inspection were used to gauge HADA against six predefined objectives. A stakeholder–objective coverage matrix showed 100 % fulfilment: every role could invoke conversational control, trace KPIs and values, detect and correct bias (ZIP-code case), and reproduce decision lineage—without dependence on a particular agent hierarchy or LLM provider. Contributions. The research delivers (i) an open-source HADA reference architecture, (ii) an evaluated mid-range design theory for human–AI alignment in multi-agent settings, and (iii) empirical evidence that framework-agnostic, protocol-compliant stakeholder agents can simultaneously enhance accuracy, transparency and ethical compliance in real-world decision pipelines.

Paper Nr:	54
Title:	Evaluating Agentic AI Through a Geopolitical Simulation Sandbox
Authors:	Jean Sebastien Dessureault, Alain Thierry Iliho Manzi, Soukaina Alaoui Ismaili, Alitiana Mijoro Barisoa, Donald Yankam Djioke, Alex Bergeron, Assefa Yared-Amare, Mireille Lalancette and Éric Bélanger
Abstract:	Intelligent agents are the technology of the moment, with their ability to interact in their environment according to the objectives set for them. Their immense potential makes them a disruptive technology, and they will undoubtedly have a global impact on the job market. This technology will bring its share of benefits, but also its share of challenges. With this in mind, it is crucial to understand, evaluate, and circumscribe this technology to optimise its positive effects while also mitigating its adverse effects. This is no easy task, since agents, especially generative agents, are not deterministic, and their quantitative evaluation poses a challenge. This paper presents a geopolitical simulator used as a testbed for the development, evaluation, and circumscription of Large Language Model (LLM) based intelligent agents. Using a fictitious world map of 20 countries enables agents and their hierarchies to interact, communicate, negotiate, and collaborate to achieve their goals. Agents’ behaviours and decisions are observed, evaluated, and quantified using methods that implement metrics to describe their level of ethics, collaboration, negotiation, creativity, and so on. The simulation is carried out using a hierarchy of classes, instantiated as objects, whose different variables have a causal and cascading impact on one another. Eventually, the various agents, based on popular LLMs (Chat GPT, Claude, Mistral, and others), will be compared and classified according to each of the metrics, providing the community with more in-depth knowledge of each of the major models.

Paper Nr:	35
Title:	Protocol-Driven Agentic Integration of LLMs: A Multi-Tool Use Case
Authors:	Pinar Ersoy and Mustafa Erşahin
Abstract:	Agentic architectures that ensemble specialized AI agents offer modularity, paral-lel throughput, and fault isolation for complex workflows yet demand rigorous integration layers to connect large-language-model (LLM) reasoning with exter-nal tools in a reproducible, auditable manner. We present a comprehensive inte-gration of Anthropic’s Model Context Protocol (MCP)—a transport-agnostic JSON-RPC 2.0 framework whose machine-readable schemas advertise typed ca-pabilities and unified error codes—with LangGraph. This state-preserving or-chestrator models agent workflows as directed graphs of concurrent nodes. An industrial-grade MCP server implemented in Python 3.11 enforces strict JSON-Schema validation, cryptographic authentication, and exponential back-off retries. It is deployed as a Helm-packaged container for bit-for-bit reproducibility. The stack is evaluated in four production-realistic scenarios on a homogeneous Ku-bernetes cluster. First, an enterprise knowledge-base assistant couples an LLM planner with a vector-search MCP resource to sustain sub-second P95 responses in the documents while isolating schema violations. Second, an autonomous data-analysis pipeline interleaves sandboxed Python evaluation with on-the-fly SVG charting, generating FAIR-compliant provenance bundles that ensure data are findable, accessible, interoperable, and reusable. Third, a cross-agent scheduling workflow integrates Google Agent-to-Agent delegation with vertical MCP tool calls to Microsoft Graph, upholding ISO 27001 segregation of duties and sub-second end-to-end latency under continuous soak testing. Finally, an automated pharmacovigilance pipeline streams PubMed abstracts, extracts clinical metrics, and compiles Periodic Safety Update Reports whose SHA-256-backed audit trail meets EMA Good Pharmacovigilance Practice and CFR Part 11 electronic rec-ords requirements. Across these cases, MCP-enabled graphs scale linearly until external services saturate, degrade gracefully under injected faults, and retain end-to-end traceability. All code artifacts, container charts, and telemetry dashboards are released to foster replication, establishing the LangGraph and MCP stack as a robust foundation for compliant, high-throughput multi-agent systems.

Area 2 - Models

Full Papers

Paper Nr:	49
Title:	Towards Interpretable Automated Question Answering Model Evaluation and Comparison
Authors:	Ricardo Saraiva Grava, Anarosa Alves Franco Brandão, Sarajane Marques Peres and Fábio Gagliardi Cozman
Abstract:	Question Answering is held as one of the most applicable natural language generation tasks, assisting users in search of objective answers to specific questions, such as in chatbots or search engines. Large Language Models have been shown to surpass previous state of the art performance in the task, and yet, progress in the field is hard to track due to a lack of reliable automated model evaluation methods. The traditional evaluation metrics are known to produce results that don’t correlate well to human judgement and lack interpretability. In this work, we push for more interpretable evaluation by means of adversarial attacks that test answerers against stressful inputs, which may confuse a model but not a human being. These include the insertion of typographical mistakes, word swaps, and insertion of additional unrelated context. We employ the Q3AE method to do so, and test it on models of the LLaMA and Gemma architectures, showcasing the additional feedback attacks can provide when compared to the use of traditional metrics. We find that the models can be surprisingly brittle when exposed to character level attacks.

Paper Nr:	101
Title:	Memory Strategies for LLM Agents: A Comparative Study of In-Context and Episodic Architectures
Authors:	Frances A. Santos, Leandro A. Villas and Julio Cesar dos Reis
Abstract:	A core challenge in developing conversational agents with Large Language Models (LLMs) is their inability to retain information across interactions. To address this challenge, this work investigates the impact of two memory strategies, in-context memory and episodic memory, using Random Write with K-Nearest Neighbors (K-NN) retrieval for multi-turn interactions, where coherence and context continuity are essential for the performance of LLMs-based agents. We evaluate each approach individually and in combination using the OpenAssistant Conversations (OASST1) dataset, measuring answer correctness, latency, cost, and memory usage. Our results show that memory-augmented agents significantly outperform the baseline, with in-context memory reducing incorrect responses by 57\%. The hybrid strategy achieves the highest overall correctness with minimal impact on latency and cost. These findings underscore the importance of memory mechanisms in building more adaptive, context-aware conversational agents.

Paper Nr:	142
Title:	Leveraging LLM Reflection to Improve Small Language Model Agents' Capabilities
Authors:	Aissa Hadj Mohamed, Leandro A. Villas and Julio Cesar dos Reis
Abstract:	In recent years, the emergence of Small Language Models (SLMs) has opened new possibilities for deploying lightweight, efficient AI systems across a range of applications. However, SLMs tend to produce limited outputs for tasks that require complex reasoning and selfevaluation. Their limited capacity often results in suboptimal effectiveness, particularly in user-aligned generation tasks. This study introduces a novel framework that leverages Large Language Models (LLMs) to provide reflective feedback on the outputs generated by SLM-based autonomous agents. Given a user prompt, the SLM agent produces an initial response, which is then evaluated by the LLM in the context of user feedback to produce a reflection. This reflection is stored in a dynamic memory module for reflection. This provides a growing repository of corrective insights tailored to the SLM’s historical operation. These reflections are then used to guide the SLM agent in outputting improved upcoming responses to user prompts. Our experimental evaluations, conducted on the small language models Deepseek-8B and Phi-2 with 8 billion and 2.7 billion parameters respectively, on the ARC challenge dataset, demonstrate improvements in output quality compared to the baseline approach, where the SLM agents self-reflect on their outputs. When prompted with external reflections, Deepseek-8B increases its accuracy score from 62.78% to 81.38% versus 77.70% with its selfreflections. Our findings highlight the effectiveness of externalized reflection memory as an augmentation strategy to enhance SLM agents’ outcome without increasing inference-time cost.

Short Papers

Paper Nr:	25
Title:	Baba Is LLM: Reasoning in a Game with Dynamic Rules
Authors:	Fien van Wetten, Aske Plaat and Max van Duijn
Abstract:	Large language models (LLMs) are known to perform well on language tasks, but struggle with reasoning tasks. This paper explores the ability of LLMs to play the 2D puzzle game Baba is You, in which players manipulate rules by rearranging text blocks that define object properties. Given that this rule-manipulation relies on language abilities and reasoning, it is a compelling challenge for LLMs. Six LLMs are evaluated using different prompt types, including (1) simple, (2) rule-extended and (3) action-extended prompts. In addition, two models (Mistral, OLMo) are finetuned using textual and structural data from the game. Results show that while larger models (particularly GPT-4o) perform better in reasoning and puzzle solving, smaller unadapted models struggle to recognize game mechanics or apply rule changes. Finetuning improves the ability to analyze the game levels, but does not significantly improve solution formulation. We conclude that even for state-of-the-art and finetuned LLMs, reasoning about dynamic rule changes is difficult (specifically, understanding the use-mention distinction). The results provide insights into the applicability of LLMs to complex problem-solving tasks and highlight the suitability of games with dynamically changing rules for testing reasoning and reflection by LLMs.

Paper Nr:	87
Title:	Trustworthy AI in Design: Introducing Explainable Agent Systems
Authors:	Emanuel Ribeiro, Tiago Pinto, Arsénio Reis and João Barroso
Abstract:	As industrial product development becomes increasingly complex and knowledge-intensive, the integration of Artificial Intelligence (AI) agents into design workflows offers great potential to improve efficiency and decision making. However, the opacity of current AI reasoning processes remains a major obstacle for adoption in engineering domains. This position paper explores the need for Explainable AI (XAI) within agentic design systems, proposing a conceptual architecture where agents, powered by Large Language Models (LLMs), not only perform domain-specific tasks, but also generate human-readable justifications for their decisions. Unlike black-box systems, these agents are designed to promote transparency, trust, and traceability, all of which are critical in highstakes industrial contexts. Building upon the foundation of the Agentic Approach to Product Design, we outline how roles such as requirement analysis, material election, and specification interpretation can be reimagined with explainability at their core. This work advocates for a shift towards interpretable, auditable AI assistants, capable of supporting collaborative engineering processes. An illustrative scenario is used to exemplify the practical value and challenges of agents supported by XAI. Future research directions are highlighted, including evaluation metrics for explainability and potential integrations into existing agent orchestration platforms such as CrewAI. As a conceptual position paper, this work aims to stimulate the development of explainable multi-agent design systems and guide future empirical validation in industrial contexts.

Paper Nr:	97
Title:	NoRMA: A Multi-Agent Communication-Centric Dataset for Enhanced Customs Nomenclature Classification
Authors:	Hicham Bouchtib, Kaouter Karboub, Mohamed Tabaa and Mohamed Hamlich
Abstract:	Large Language Models-based agents can offer some entry-level consulting in customs services either to importers or exporters. Nevertheless, current general-domain benchmark datasets cannot fully capture the complexity of realworld customs awareness for precise decision-making. To address this, MultiAgents Systems (MAS) have arisen as a very promising approach to decrease the likelihood of LLMs’ hallucination when acting as artificial intelligence assistants. Given that the Harmonized System (HS) code determination is an essential element for an accurate cost estimation, customs declaration and regulatory compliance for any kind of firm, this paper introduces NoRMA (Nomenclature oRganization for Multi-agent Assistance) and NoRMA-base, two novel high-quality customs datasets consisting of 3424 curated entries each, originally obtained from the official Moroccan Customs and Indirect Tax Administration (Administration des Douanes et Impôts Indirects, ADII), focusing on two chapters: organic chemicals and inorganic chemicals. Subsequently adapted to be integrated directly with Multi-Agent Systems (MAS) for evaluating and fine tuning purposes. Initial evaluation with heavyweight LLMs such as GPT-4o, DeepSeek R1 671B, and LLaMA 3 405B shows they achieve only 55% accuracy on the first 100 questions, justifying the need for a domain-specific fine tuning datasets. To the best of our knowledge, few studies analyze the feature-level combination of LLMs and Multi-Agent systems state-of-the-art for customs nomenclature classification, which amplifies interest in our paper and its findings. The datasets are publicly available: https://huggingface.co/datasets/hichambht32/NoRMA.

Paper Nr:	22
Title:	ScaleMCP: Dynamic and Auto-Synchronizing Model Context Protocol Tools for LLM Agents
Authors:	Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju and James A. Burke
Abstract:	Recent advancements in Large Language Models (LLMs) and the introduction of the Model Context Protocol (MCP) have significantly expanded LLM agents’ capability to interact dynamically with external tools and APIs. Existing frameworks lack MCP integration, relying on error-prone manual updates to monolithic repositories, causing duplication and inefficiency. Additionally, current approaches abstract tool selection before the LLM agent is invoked, limiting its autonomy and hindering dynamic re-querying capabilities during multi-turn interactions. To address these issues, we introduce ScaleMCP, a novel tool selection approach that dynamically equips LLM agents with a MCP tool retriever, giving agents the autonomy to add tools into their memory, as well as an autosynchronizing tool storage system pipeline through CRUD (create, read, update, delete) operations with MCP servers as the single source of truth. We also propose a novel embedding strategy, Tool Document Weighted Average (TDWA), designed to selectively emphasize critical components of tool documents (e.g. tool name or synthetic questions) during the embedding process. Comprehensive evaluations conducted on a newly created ScaleMCP benchmark of 5,000 financial metric MCP servers, across 10 LLM models, 5 embedding models, and 5 retriever types, demonstrate substantial improvements in tool retrieval and LLM agent performance, emphasizing ScaleMCP’s effectivness in scalable, dynamic tool selection and invocation.

Paper Nr:	58
Title:	Energy-Aware Routing in Sensor Networks Using Static and Dynamic Agent Selection
Authors:	Prapulla S B, N U Shivaraja, Manyamala Sunaina, Tanisha Srivastava, Deepamala N, Amit Sata and Smriti Srivastava
Abstract:	Energy efficiency remains a key issue in wireless sensory networks (WSNs) with limited energy resources of sensory nodes and drawbacks of normal protocols such as LEACH. To overcome these limitations, we provide a hybrid routing protocol that combines optimized data collection, combining static and dynamic agents. Static agents, called local data collector agents (LDCAs), are selected using a system based on ambiguous decisions that takes into account important parameters, particularly residual energy, central nodes, agent speed, and provide a balanced and effective grouping. To expand coverage and minimize energy holes,dynamic agents—Global Data Collector Agents (GDCAs)—follow a dual mobility model comprising spiral and star patterns. Additionally, a Random forest training model is included using the same parameters for rational optimization of decision-making to increase the accuracy and adaptation of LDCA selection. Integration of autolearning not only improves agent selection, but also improves protocol scalability and responsiveness under dynamic network conditions. Modeling results show that the proposed protocol significantly reduces energy consumption and improves network stability and adaptability compared to LEACH. This dual-agent architecture lays the groundwork for dynamic decision- making in future WSN deployments while ensuring quality- of-service-aware transmission and extended network lifespan.

Paper Nr:	120
Title:	Agent-Based Business Process Design: A Declarative Approach Beyond Procedural Workflows
Authors:	Mohammad Azarijafari, Luisa Mich and Michele Missikoff
Abstract:	Business processes (BP) represent key factors of companies’ organizations. Over recent decades, researchers and managers have invested significant efforts in analysing, designing, and optimising BPs. A relevant role in BP related projects is played by digital technology, to automatise as much as possible the activities involved, from those dedicated to analysis and modelling to re-design and implementation. However, existing methodologies are challenged by a number of critical issues. The most relevant: inherent complexity of analysis and design, lack of flexibility, and difficulties in dealing with multifaceted business environments. To address such problems, in this paper we propose a design methodology based on Conceptual Modelling and a declaratative formalization of BPs with extensive use of artificial intelligent agents. The core of the methodology is given by three foundational notions: AI Agents, Business Objects, and Goals. The proposed agentic approach, referred to AGO (Agent, Goal, Object) is illustrated with an example to highlight the shift of the proposed paradigm. By moving away from traditional task-oriented approach, the AGO methodology marks a foundational step toward effective agentic business process management.

Area 3 - Applications

Full Papers

Paper Nr:	33
Title:	A Study on Multi-Agent Collaboration for Business Process Automation in Enterprise Resource Planning Systems
Authors:	Jonas Schnepf, Matthias Schwarz, Bernd Scheuermann and Simon Anderer
Abstract:	Automating business processes in enterprise resource planning (ERP) systems offers significant potential to reduce manual workload and improve efficiency. The limitations of robotic process automation in handling complex business processes have sparked an increasing interest in artificial intelligence-based solutions, particularly those using large language models (LLM). Although existing approaches utilize LLM-based agents for business process automation, they typically do not leverage the benefits of multi-agent collaboration, such as divergent thinking or reasoning. Additionally, important security principles like segregation of duties (SoD) are not considered. Therefore, this study addresses these gaps by proposing the use of multiple LLM-based agents for automating ERP business processes. An error-driven prompt engineering approach is presented to design reliable business process automation involving multiple agents. This approach is applied to the procure-topay process, taking into account different multi-agent scenarios to incorporate SoD. Through error-driven prompt engineering, a consistently reliable automation was achieved. Overall, this study aims to facilitate the use of ERP systems in a collaborative environment while ensuring compliance with fundamental security principles.

Paper Nr:	57
Title:	Prompt-Augmentation for Evolving Heuristics for a Niche Optimization Problem
Authors:	Thomas Bömer, Nico Koltermann, Max Disselnmeyer, Laura Dörr and Anne Meyer
Abstract:	Combinatorial optimization problems often rely on heuristic algorithms to generate efficient solutions. However, the manual design of heuristics is resource-intensive and constrained by the designer’s expertise. Recent advances in artificial intelligence, particularly large language models (LLMs), have demonstrated the potential to automate heuristic generation through evolutionary frameworks. Recent works focus only on well-known combinatorial optimization problems like the traveling salesman problem and online bin packing problem when designing constructive heuristics. This study investigates whether LLMs can effectively generate heuristics for niche, not yet broadly researched optimization problems, using the unit-load pre-marshalling problem as an example case. Building on the Evolution of Heuristics (EoH) framework, we introduce two prompt augmentation strategies: Contextual EoH (CEoH), which incorporates problem-specific descriptions to enhance in-context learning, and Literature-Based CEoH (LitCEoH), which integrates heuristic insights drawn from domain literature via a novel prompt design. We conduct extensive computational experiments comparing EoH, CEoH, and LitCEoH across a wide range of problem instances. Results show that CEoH and LitCEoH enable smaller LLMs to generate high-quality heuristics more consistently and even outperform larger models. Further, we find LitCEoH to improve scalability to diverse instance configurations. The code is available: https://github.com/nico-koltermann/LitCEoH

Paper Nr:	94
Title:	Solidity Meets LLMs: A Transformer-Based Approach to Smart Contract Vulnerability Detection
Authors:	Djamel Eddine Hakim Ghorab, Farid Mokhati and Mostafa Anouar Ghorab
Abstract:	The growing adoption of blockchain technologies, particularly the Ethereum platform, has amplified the critical role of smart contracts in decentralized applications. However, the increasing complexity and financial value of these contracts make them prime targets for cyber attacks. In this work, we present a transformer-based approach for the detection of vulnerabilities in smart contract fragments written in Solidity. Leveraging the representational power of pre-trained Large Language Models (LLMs), we construct a robust pipeline that includes the definition of a ground truth dataset, labeling code fragments as vulnerable or safe. We then fine-tune a BERT-based architecture on this dataset, enabling the model to capture the syntactic and semantic patterns specific to Solidity code. Our fine-tuned model demonstrates strong performance, achieving an F1 score of 92%, and highlighting the effectiveness of LLM adaptation in enhancing smart contract security through deep contextual understanding.

Paper Nr:	114
Title:	TE-CTABGAN+: A Unified GAN Framework for Generating Synthetic Electronic Health Records
Authors:	Yash Malviya, Devesh Bhangale, Chun-Kit Ngan, Sambasiva Rao Gangeini, William Rorbach, Ashley-Kay Basile and Paulo Bandeira Pinho
Abstract:	In this work, we propose Temporal Extended (TE)-CTABGAN+, a unified GAN framework for generating synthetic Electronic Health Records (EHRs) that encompass continuous, categorical, and temporal variables. Our framework makes six key contributions: (1) develop a Noise Injection Pipeline (NIP) to introduce controlled variability into synthetic data; (2) utilize CTABGAN+ to produce high-fidelity synthetic EHR data, comprising both continuous and categorical variables, from NIP output; (3) design a Temporal Sequence Processor (TSP) to preserve temporal integrity in EHRs; (4) integrate TimeGAN to capture temporal dependencies in EHR data; (5) conduct extensive experiments on Synthea and MIMIC-IV datasets, evaluating our framework’s fidelity and coherence using statistical metrics, including Kolmogorov–Smirnov (KS), Jensen-Shannon Divergence (JSD), and Chi-Square (CS); and (6) perform a domain expert evaluation with a medical doctor and a healthcare professional, quantifying their ability to distinguish between real and synthetic EHRs using Accuracy, Precision, Negative Predictive Value, Specificity, and Matthews Correlation Coefficient. Our results demonstrate TE-CTABGAN+’s superior performance, achieving the lowest KS, JSD, and CS. The domain expert evaluation shows that both experts struggle to distinguish between real and synthetic EHRs, with performance approximating random chance. These findings indicate that TE-CTABGAN+ generates highly realistic synthetic EHRs, preserving both temporal and non-temporal coherence. Our framework has the potential to facilitate the creation of high-quality synthetic EHR datasets, supporting various healthcare applications while maintaining patient data confidentiality.

Short Papers

Paper Nr:	17
Title:	Rawlsian Agents: An Application of Large Language Models (LLM) to Forge Fairer Bilateral Agreements
Authors:	Sergio M. Ferro, Jenny Tai, Raewyn Tsai, Salone Verma, Jenny Ma, Nora Skjerdal and Martin Lopatka
Abstract:	John Rawls' theory of justice as fairness remains a pivotal framework in understanding societal contracts and the fair administration of justice. While highly influential, key concepts from Rawls' framework for comparative justice have proven difficult to realize in practice. In this paper, we present a novel approach that uses General Purpose Artificial Intelligence (GPAI) agents to forge fairer bilateral agreements. We use nuptial agreements to assess the performance of our approach for semi-automatic generation of bilateral legal agreements. Our results demonstrate the utility that Large Language Models (LLMs) can achieve in the analysis of human made agreements. This approach maintains sensitivity to some of the same vulnerabilities that apply to human-authored agreements, including intentional and unintentional deception and non-disclosure of key information. Nonetheless, we believe that Rawlsian agents represent a novel application of AI to the authorship of bilateral agreements from a position of fundamental fairness. Potential extensions of this work could reduce systematic inequalities in legal access, and ultimately help realize Rawls' vision of a fairer society.

Paper Nr:	31
Title:	A Cascaded Deep Generative Model for Audio-to-Sign Language Translation
Authors:	Rawan Bafaqih, Lama Almoutiry, Rihan Alqahtani and Nuha Aldausari
Abstract:	Individuals with auditory impairments face significant challenges incommunicating with others in their daily lives, even during simple interactions. These communication barriers can lead to social isolation and limit opportunities for education, employment, and everyday interactions. Our paper aims to address this issue by developing a speech-to-sign language translation system. This system will convert spoken language into sign language, helping bridge the communication gap between deaf and hearing individuals. Currently, solutions that effectively handle sign language translation are limited due to the complexity of gestures, expressions, and context in sign languages, posing a significant barrier for those who rely on it for communication. By addressing this need, our project aims to provide a vital tool for individuals with hearing impairments, making everyday communication more accessible for everyone. Our model has a sequence of steps. The model first converts the audio signal to text. Then, the text is converted to gloss, a simplified written representation of sign language that removes grammatical inflections and spatial grammar. Finally, the gloss is used to extract keypoints that are utilized to render the final video. Our system is tested against multiple state-of-the-art models using various evaluation metrics and across several datasets.

Paper Nr:	122
Title:	Enhancing Debunking Effectiveness Through LLM-Based Personality Adaptation
Authors:	Pietro Dell'Oglio, Alessandro Bondielli, Francesco Marcelloni and Lucia C. Passaro
Abstract:	This study proposes a novel methodology for generating personalized fake news debunking messages by prompting Large Language Models (LLMs) with persona-based inputs aligned to the Big Five personality traits: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. Our approach guides LLMs to transform generic debunking content into personalized versions tailored to specific personality profiles. To assess the effectiveness of these transformations, we employ a separate LLM as an automated evaluator simulating corresponding personality traits, thereby eliminating the need for costly human evaluation panels. Our results show that personalized messages are generally seen as more persuasive than generic ones. We also find that traits like Openness tend to increase persuadability, while Neuroticism can lower it. Differences between LLM evaluators suggest that using multiple models provides a clearer picture. Overall, this work demonstrates a practical way to create more targeted debunking messages exploiting LLMs, while also raising important ethical questions about how such technology might be used.

Paper Nr:	148
Title:	Enhancing the A2A Protocol with Blockchain-Based Identities and x402 Micropayments for Agentic AI
Authors:	Awid Vaziry, Sandro Rodriguez Garzon and Axel Küpper
Abstract:	This research presents an architecture that enables open, economically viable multi-agent ecosystems. Two core limitations of the Agent2Agent (A2A) protocol are addressed: decentralized agent discovery and integrated agent-to-agent micropayments. The current A2A protocol lacks discoverability, reputation, and integrated payment functionality, preventing agents from reliably finding, trusting, and compensating other agents. Agent-to-agent micropayments are essential as agents require compensation for their services, yet traditional payment methods prove unsuitable for micropayments due to onboarding requirements and overhead costs. To address these limitations, AgentCards are published on-chain as smart contracts, enabling tamper-evident, verifiable agent identities with decentralized discoverability. A2A is further extended through integration with the x402 open standard, which facilitates blockchain-agnostic, HTTP-based micropayments via HTTP 402 status code. This integration enables autonomous agents to seamlessly discover, authenticate, and compensate each other across organizational boundaries. A comprehensive prototype implementation demonstrates the feasibility of blockchain-based agent discoverability and seamless micropayment integration. Elements of this approach have subsequently been adopted in industry implementations and the development of Ethereum standards. The proposed approach establishes foundational infrastructure for secure, scalable, and economically viable multi-agent ecosystems, advancing agentic AI toward trusted autonomous economic interactions.

Paper Nr:	158
Title:	Artificial Intelligence in Emotional Induction: A VR and Electronic Music Approach
Authors:	Pedro Benitez, Irene Fondón and María Luz Montesinos
Abstract:	This article explores the intersection of Virtual Reality (VR), electronic music, and Artificial Intelligence (AI) as a basis for emotional induction. The proposed framework aims to validate the feasibility of combining immersive VR environments with AI-generated electronic music to influence emotional and cognitive states. The system is designed as a configurable template that allows researchers to experiment with various audiovisual stimuli in a controlled, immersive setting. A central component of the system is the use of generative AI models to produce dynamic electronic music in real time. These models can adapt the auditory output based on predefined parameters or live input from the experimenter, enabling a high degree of customization and responsiveness. The VR environment is designed to be modular and easily reconfigurable, allowing for the integration of different visual elements, spatial layouts, and interaction mechanisms. This template serves as a foundation for future experimentation in fields such as neuroscience, immersive art, and cognitive research. It also provides a flexible platform for testing hypotheses related to the impact of multisensory environments on mental states. The tool is publicly available at https://zenodo.org/records/15656891.

Paper Nr:	159
Title:	DermETAS-SNA LLM: A Dermatology Focused Evolutionary Transformer Architecture Search with StackNet Augmented LLM Assistant
Authors:	Nitya Phani Santosh Oruganty, Keerthi Vemula Murali, Chun-Kit Ngan and Paulo Pinho
Abstract:	This work introduces the DermETAS-SNA LLM Assistant that integrates Dermatology-focused Evolutionary Transformer Architecture Search (ETAS) with StackNet Augmented LLM. The assistant dynamically learns skin-disease classifiers and provides medically informed descriptions to facilitate clinician-patient interpretation. Contributions include: (1) Developed an ETAS framework on the SKINCON dataset to optimize a Vision Transformer (ViT) tailored for dermatological feature representation and then fine-tuned binary classifiers for each of the 23 skin disease categories in the DermNet dataset to enhance classification performance; (2) Designed a StackNet architecture that integrates multiple fine-tuned binary ViT classifiers to enhance predictive robustness and mitigate class imbalance issues; (3) Implemented a RAG pipeline, termed Diagnostic Explanation and Retrieval Model for Dermatology, which harnesses the capabilities of the Google Gemini 2.5 Pro LLM architecture to generate personalized, contextually informed diagnostic descriptions and explanations for patients, leveraging a repository of verified dermatological materials; (4) Performed extensive experimental evaluations on 23 skin disease categories to demonstrate substantial performance increase, achieving an overall F1-score of 56.30% that notably surpasses SkinGPT-4 (48.51%) by a considerable margin, representing a performance increase of 16.06%; (5) Conducted a domain-expert evaluation, with eight licensed medical doctors, of the clinical responses generated by our AI assistant for seven dermatological conditions. Our results show a 92% agreement rate with the assessments provided by our AI assistant, demonstrating superior performance compared to SkinGPT-4 with a substantial margin (48.20%); and (6) Created a proof-of-concept prototype that fully integrates our DermETAS-SNA LLM into our AI assistant to demonstrate its practical feasibility for real-world clinical and educational applications.

Paper Nr:	60
Title:	Analysis of Strategies for Interacting with Large Language Models in Meeting Time Optimization
Authors:	Anton Agafonov and Andrew Ponomarev
Abstract:	Optimal meeting time selection based on user constraints expressed in natural language is a challenging decision support problem characterized by implicit and dynamic preferences. This paper investigates interaction strategies with large language models (LLMs) to address this challenge. Two fundamentally different strategies are analyzed: (1) an agentic approach based on generating formal constraints, followed by solving the resulting optimization problem using an external algorithm, and (2) a direct generation approach that obtains a solution from the model in the form of time slots, intervals, or a binary vector. The evaluation is performed using a specially constructed dataset of 100 synthetic queries and the corresponding reference constraints. The results show that the agentic approach demonstrates the highest quality and stability, especially as the number of preferences increases. At the same time, direct generation strategies prove effective when dealing with a small number of constraints and simple formulations, highlighting the importance of selecting an appropriate interaction format with LLMs in practical intelligent systems. These findings open avenues for creating flexible and adaptive scheduling interfaces powered by modern language models.

Paper Nr:	118
Title:	Data-Driven Insights for Crowdfunding Success in the Developing World
Authors:	Virgilijus Sakalauskas, Dalia Kriksciuniene and Paulius Baltrušaitis
Abstract:	By pooling small contributions from a broad audience, crowdfunding platforms enable entrepreneurs and small businesses from developing countries to access critical funding that traditional financial systems may not provide. This study implements machine learning methods to detect success trends in Developing World Crowdfunding. Using data from crowdfunding websites, like Kiva.org, a leading microlending platform, we used machine learning to determine what makes projects successful. Our findings reveal several key predictors of campaign success, including the borrower's region, gender, loan amount, and sector of activity. Notably, the models demonstrated strong predictive power, highlighting clear patterns in how these factors influence funding outcomes. Beyond prediction, the study offers valuable insights for practitioners and platform designers aiming to improve campaign visibility and funding efficiency. For example, gender and regional disparities uncovered by the models point to structural imbalances that can be addressed through targeted support or algorithmic adjustments. By translating raw campaign data into actionable knowledge, this research contributes to developing more equitable and effective crowdfunding ecosystems.