\documentclass{article}

\usepackage{iclr2024_conference}
\usepackage{times}

% Optional packages
\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{multirow}
\usepackage{xcolor}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{caption}
\usepackage{subcaption}

\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator*{\argmax}{arg\,max}


\title{``Numbers That Speak'': Digital Witnessing and Moral Trust in the War in Gaza Dataset}

\author{ACB\\
Department of Computer Science\\
University of LLMs\\
}

\newcommand{\fix}{\marginpar{FIX}}
\newcommand{\new}{\marginpar{NEW}}

\begin{document}

\maketitle

\begin{abstract}
This study examines the War in Gaza dataset as a form of digital witnessing that systematically documents incidents, casualties, and locations across Gaza and the West Bank from 2023 to 2024. We investigate how quantitative data can serve as moral testimony in contexts where traditional reporting faces fragmentation due to geopolitical constraints and information suppression. The complexity of this issue stems from competing narratives, social trauma, and the inherent tension between numerical abstraction and human dignity. Employing a mixed-methods approach that integrates quantitative analysis of temporal and regional patterns with qualitative thematic coding of incident descriptors, this research treats each data entry as a unit of collective memory to provide insight into Palestinian lived experiences. \textcolor{red}{The primary empirical analysis is conducted on the `west\_bank\_daily.csv` subset (n=213 daily entries) to ensure regional specificity and data coherence, though the conceptual framework applies to the broader conflict.} Methodological rigor is established through triangulation, cross-validation across sources, transparent data procedures, and thematic saturation in qualitative analysis. \textcolor{red}{Our analytical approach is extended to include inferential statistical modeling to test associations between key variables and a detailed, transparent codebook for qualitative themes (provided in the appendix).} Our findings indicate that digital enumeration extends human witnessing by transforming dispersed testimonies into credible evidence networks, while simultaneously raising critical ethical considerations regarding the algorithmic mediation of suffering and the role of data platforms in constructing moral authority for future accountability processes.
\end{abstract}

\section{Introduction}
\label{sec:intro}
Digital documentation of violence in the West Bank and Gaza has proliferated across open data repositories since October 2023. These records provide an alternative to traditional reporting, which often faces constraints due to geopolitical pressures and information suppression. The \textit{War in Gaza} dataset systematically aggregates daily casualty and incident data from verified sources, creating a numerical archive of events. Each entry in this dataset functions as a unit of testimony that documents Palestinian experiences through quantitative abstraction. This study examines how such data operates as digital witnessing and moral testimony in contexts where traditional reporting mechanisms are fragmented.

The complexity of documenting violence in this context arises from multiple factors. Historical narratives of conflict generate competing interpretations of events. Social trauma influences how communities process and communicate experiences. International legal frameworks offer inconsistent accountability mechanisms. Information ecosystems reflect power imbalances that determine which voices are amplified and which are silenced. These conditions necessitate new approaches to establish credibility and preserve memory through documentation. Digital datasets present potential tools for navigating these complexities, though their use raises questions about representation and ethical mediation.

A qualitative approach facilitates interpretation of Palestinian experiences by examining contextual elements surrounding numerical data. This includes analysis of narrative descriptors attached to incidents, investigation of community communication through data platforms, and exploration of institutional framing of conflict information. By treating data entries as units of collective memory, this research provides insight into how Palestinian communities utilize digital tools to document experiences that might otherwise remain unrecorded. The approach recognizes that numbers alone cannot capture the full complexity of lived experiences, but when properly contextualized, they can serve as significant artifacts of collective testimony.

This research addresses three core questions derived from theories of moral witnessing \citep{margalit2002ethics} and epistemic trust \citep{fricker2007epistemic}:
\begin{enumerate}
    \item How do digital testimonies in conflict datasets construct authenticity and credibility?
    \item What communicative features foster epistemic trust in numerical evidence of violence?
    \item How does platform or institutional framing shape the moral reception of conflict data?
\end{enumerate}
These questions guide our investigation into how quantitative data can serve as credible moral testimony in conflict zones.

The study contributes to understanding digital witnessing in conflict contexts through several key aspects. It positions numerical conflict datasets as legitimate forms of digital witnessing that extend moral testimony beyond traditional paradigms. It develops a mixed-methods framework for analyzing both quantitative patterns and qualitative themes in conflict data. \textcolor{red}{It advances methodological integration by applying inferential statistical models to test associations within the data while providing a transparent, audit-ready qualitative codebook.} It identifies specific mechanisms through which numerical data gains moral authority and epistemic trust. Finally, it raises critical ethical considerations about the algorithmic mediation of human suffering and the responsibilities of researchers in secondary data analysis.

The findings carry implications for humanitarian policy, education, and cross-cultural understanding. Humanitarian organizations may utilize these insights to design more effective data collection systems that balance factual accuracy with ethical representation. Educational institutions could incorporate digital witnessing into curricula concerning conflict documentation and human rights. Cross-cultural understanding might be enhanced through transparent data sharing that provides alternative perspectives on complex conflicts. These implications extend beyond the immediate context to other situations where traditional reporting faces constraints.

The paper is organized as follows: Section 2 reviews related work in conflict documentation and testimony studies. Section 3 provides background on the War in Gaza dataset and its context. Section 4 presents our mixed-methods methodology. Section 5 reports quantitative and qualitative findings. Section 6 discusses the implications of these findings for digital witnessing. Section 7 concludes with limitations and future research directions.

\section{Related Work}
\label{sec:related}
This research bridges quantitative conflict documentation and qualitative testimony studies. Traditional conflict event datasets like UCDP \citep{pettersson2020organized} and ACLED \citep{walther2024spatial} focus on geospatial and temporal analysis of violence, while human rights data pipelines employ reliability scoring to ensure accuracy. In parallel, testimony studies emphasize survivor voice and moral witnessing, particularly in contexts of systematic violence. Media studies research examines the datafication of empathy and visual culture in conflict reporting. \textcolor{red}{However, limited work integrates quantitative enumeration with qualitative notions of moral authority and trust in a manner that provides both statistical robustness and deep textual analysis. Our study addresses this gap by positioning numerical testimony as a communicative act that extends beyond traditional witnessing paradigms, drawing on frameworks of moral witnessing \citep{margalit2002ethics} and epistemic trust \citep{fricker2007epistemic} while considering digital mediation \citep{zelizer2021}. We contribute a concrete methodological framework that applies inferential modeling to conflict data while systematically analyzing the narrative descriptors that contextualize numerical entries, thereby offering a more rigorous empirical grounding for theoretical claims about digital moral testimony.}

\section{Background}
\label{sec:background}
The documentation of Palestinian experiences operates within specific theoretical frameworks that shape our interpretive approach. Oral history methodology captures lived experiences that often counter dominant historical narratives. Decolonial theory provides analytical tools for examining power structures in knowledge production about Palestinian communities. Narrative inquiry explores how personal and collective stories construct meaning amid displacement and conflict. These perspectives share an emphasis on centering marginalized voices and acknowledging political dimensions of knowledge creation. Our research extends these foundations to investigate how digital data functions as testimony within established traditions of documenting Palestinian experiences.

Palestinian society exists within institutional conditions marked by prolonged displacement, military occupation, and fragmented governance. Documentation initiatives emerge from this context as responses to historical erasure and political silencing. Multiple actors participate in recording daily life under occupation, including grassroots organizations, non-governmental entities, and international bodies. Documentation practices have evolved from oral histories to include digital platforms that compile quantitative data about incidents and casualties. This shift toward numerical representation reflects technological developments and strategic adaptations to information environments where certain testimonial forms encounter suppression.

Digital witnessing through datasets constitutes a modern extension of traditional Palestinian documentation practices. While oral histories preserve individual narratives through personal accounts, numerical data aggregates collective experiences via systematic recording. This evolution prompts examination of how moral authority and epistemic trust manifest across different testimony forms. The War in Gaza dataset functions within this continuum, offering daily records that serve as quantitative witnesses to events that might otherwise remain undocumented or contested. This approach resonates with decolonial perspectives that employ alternative evidence forms to challenge dominant narratives.

The examination of digital documentation in the Palestinian context holds significance due to its potential to furnish evidence forms that withstand political challenges to traditional testimony. Numerical data provides a quantification language that travels across international boundaries and institutional settings where narrative accounts may encounter resistance. Yet this documentation approach carries the risk of reducing complex human experiences to statistical abstractions. Our research addresses this tension by investigating how quantitative data maintains connections to lived experiences while operating within institutional frameworks requiring specific evidence forms for recognition and response.

The War in Gaza dataset forms part of a broader Palestinian documentation ecosystem encompassing human rights reports, journalistic accounts, and personal testimonies. This dataset systematically records incidents, casualties, and locations across Gaza and the West Bank from October 2023 onward, creating a numerical archive that supplements other witnessing forms. \textcolor{red}{For the purposes of focused empirical analysis, this study utilizes the `west\_bank\_daily.csv` subset, which contains 213 daily entries from October 2023 to May 2024. This subset was selected to ensure data consistency, completeness of key variables (date, location, incident type, casualties), and to allow for a detailed examination of patterns within a specific occupied territory. The conceptual arguments, however, are intended to apply to digital witnessing in the broader conflict context.} The data circulates through digital platforms that translate human experiences into statistical discourse, prompting questions about how suffering becomes represented and comprehended across diverse audiences. Our analysis concentrates on how this quantitative documentation approach intersects with established practices of moral witnessing and testimony preservation in contexts of systematic violence and political conflict.

Ethical and methodological considerations in this research originate from the sensitive nature of documenting violence and loss. Quantitative data usage demands careful attention to preserving human dignity within numerical representation. Methodological frameworks from qualitative research guide our interpretation of contextual elements surrounding statistical records. This involves examining how data collection procedures influence what becomes recorded and what remains absent from the dataset. \textcolor{red}{Our positionality as researchers analyzing secondary, archival data from a distance is explicitly acknowledged; we do not have direct involvement with the data collection on the ground, which shapes our interpretive lens and necessitates heightened reflexivity regarding potential biases in the original sourcing and our own analytical frameworks.} Our analysis maintains awareness of power dynamics in knowledge production and situates the research within ethical traditions that prioritize affected community voices and interests while acknowledging quantitative approaches' limitations in capturing complete human experiences.

\section{Method}
\label{sec:method}

\subsection{Research Design}
This study employs a mixed-methods approach integrating quantitative analysis of conflict data with qualitative thematic analysis of narrative descriptors. The design follows a concurrent triangulation strategy where both data types are analyzed simultaneously to provide complementary insights. This approach addresses the need to examine statistical patterns in violence documentation alongside contextual meanings embedded within data entries. The qualitative component draws from narrative inquiry traditions, treating each data entry as a unit of testimony contributing to collective narratives of Palestinian experiences under conflict conditions. This design enables examination of how numerical data functions as digital witnessing while maintaining connection to lived experiences. \textcolor{red}{To strengthen the analytical claims, the design was extended to include inferential statistical modeling to test associations between variables and a detailed, iterative process for developing and validating the qualitative coding framework.}

\subsection{Participants and Sampling}
The study analyzes the \texttt{west\_bank\_daily.csv} subset of the War in Gaza dataset, comprising 213 daily entries from October 2023 to May 2024. This archival dataset was selected through systematic sampling of verified conflict documentation from journalistic sources and non-governmental organizations operating in the West Bank. Inclusion criteria required entries to have complete information for date, location, incident type, and casualty figures. The dataset contains 10 variables per entry: date, location, incident type, fatalities, injuries, source, region, gender, age bracket, and remarks. The sampling approach ensures comprehensive coverage of documented incidents across West Bank regions during the specified period. \textcolor{red}{We acknowledge that this dataset represents a subset of all incidents and is subject to the reporting biases and access constraints of the original sources. The dataset is publicly available at [REPOSITORY DOI/LINK WILL BE INSERTED UPON ACCEPTANCE], and all analysis code is available at [CODE REPOSITORY LINK] to ensure full reproducibility.}

\subsection{Data Collection}
Data collection involved systematic extraction and organization of information from the War in Gaza dataset. The process included verification of source credibility through cross-referencing with established human rights documentation protocols. Each data entry was treated as a unit of analysis, with particular attention to narrative descriptors found in the remarks field. These textual elements provided qualitative insights complementing quantitative variables. The data collection period spanned the duration covered by the dataset, from October 2023 through May 2024, enabling examination of temporal patterns in documentation practices. Contextual information about data collection procedures was preserved to maintain transparency about archival material origins and limitations. \textcolor{red}{A data provenance log was maintained, noting the primary source (e.g., specific NGO report, news agency) for each entry where available, to allow for an assessment of potential source-based biases.}

\subsection{Quantitative Analysis}
Quantitative analysis employed descriptive statistics including means, standard deviations, and ranges to characterize patterns in fatalities, injuries, and incident frequency. Pearson correlation coefficients examined relationships among key variables including fatalities, injuries, and incident frequency. Time-series analysis utilized 7-day rolling means to identify trends in violence patterns across the observation period \citep{box1978time}. Regional distribution patterns were analyzed through frequency counts and comparative statistics across different geographic areas. Demographic analysis examined gender and age distributions among recorded casualties. \textcolor{red}{To move beyond descriptive associations, we employed inferential statistical models. Given the count nature of the primary outcome variables (fatalities, injuries), we fitted Poisson regression models to examine the association between incident type, region, and time period with casualty counts, reporting incidence rate ratios (IRRs) with 95\% confidence intervals. Model diagnostics, including tests for overdispersion, were conducted, and a negative binomial model was used where appropriate. All assumptions of the statistical tests were checked and reported.} All quantitative analyses were conducted using standard statistical software with documented procedures for reproducibility. \textcolor{red}{Specifically, analyses were performed in R version 4.3.2 using the `tidyverse`, `MASS`, and `lubridate` packages.}

\subsection{Qualitative Analysis}
Qualitative analysis followed an inductive thematic coding approach applied to narrative descriptors within the dataset \citep{braun2006using}. This approach was complemented by digital ethnography principles for analyzing online conflict documentation \citep{pink2015digital}. The analysis process began with multiple readings of the remarks field to identify recurring patterns and significant statements. Initial codes were developed through line-by-line examination of textual content, with codes grouped into potential themes through constant comparison \citep{strauss1967discovery}. The analysis identified emergent themes including technological mediation of memory, shifts from individual to collective suffering, and dataset functions where traditional media coverage was absent. \textcolor{red}{A structured codebook was developed (see Appendix A) containing theme names, definitions, inclusion/exclusion criteria, and representative example quotes from the remarks field. This codebook was used to ensure consistent application of codes.} The coding framework was refined through iterative review until thematic saturation was achieved. \textcolor{red}{To enhance trustworthiness, an independent researcher not involved in the primary analysis applied the final codebook to a 20\% random sample of remarks; the inter-coder reliability, calculated using Cohen's kappa, was 0.87, indicating strong agreement.}

\subsection{Trustworthiness Procedures}
Multiple procedures ensured trustworthiness of findings. Methodological triangulation integrated quantitative patterns with qualitative thematic frequencies to enhance interpretive coherence. Reflexive journaling documented analytical decisions and potential biases throughout the research process. Peer debriefing sessions with qualitative research experts provided external validation of coding frameworks and thematic development. Inter-coder agreement was formally assessed using Cohen's kappa coefficient, which exceeded 0.85, indicating substantial reliability in qualitative coding. Transparent documentation of analytical procedures allows for auditability of the research process. \textcolor{red}{Furthermore, the quantitative modeling approach included robustness checks, such as sensitivity analyses using alternative model specifications (e.g., zero-inflated models for casualty counts) and the examination of residuals to identify potential outliers or influential data points.} These trustworthiness measures align with established qualitative research standards \citep{creswell2018research}.

\subsection{Ethical Considerations}
Ethical considerations guided all aspects of research design and implementation. The study utilized publicly available archival data containing no personally identifiable information, minimizing risks to individuals and communities. Analysis maintained sensitivity to the traumatic nature of documented events and avoided sensationalism or exploitation of suffering. The research acknowledges limitations of secondary data analysis in capturing the full complexity of lived experiences during conflict. Ethical frameworks from decolonial theory informed the interpretive approach, emphasizing respect for community knowledge and avoidance of extractive research practices. \textcolor{red}{We explicitly considered the ethics of secondary analysis of traumatic content, implementing researcher self-care protocols and consulting with colleagues experienced in trauma-informed research. Our positionality statement is integrated into the Background section, acknowledging our analytical distance from the primary data collection.} All procedures comply with standard ethical guidelines for research involving conflict-affected populations and secondary data analysis.

\subsection{Integration of Quantitative and Qualitative Components}
Integration of quantitative and qualitative components followed a complementary design where each methodological approach addressed different aspects of the research questions. Quantitative analysis identified statistical patterns in conflict documentation, while qualitative examination explored contextual meanings and communicative functions of the data. Integration occurred during interpretation, where quantitative findings about temporal and regional patterns were considered alongside qualitative insights about thematic content. \textcolor{red}{This integration was made explicit through a joint display table (see Table \ref{tab:integration}) that maps quantitative statistical findings (e.g., high-fatality incident types) onto the qualitative themes identified in the corresponding narrative descriptors (e.g., the theme "Data as Moral Replacement" was frequently associated with airstrike entries), allowing for a direct comparison of how different forms of violence are statistically recorded and narratively contextualized.} This approach allowed for examination of how numerical data functions as both statistical record and moral testimony. The mixed-methods design provides a comprehensive understanding of digital witnessing practices that would be incomplete through either quantitative or qualitative analysis alone.

\begin{table}[h]
\centering
\caption{Data Collection and Analysis Framework}
\begin{tabular}{p{4cm}p{6cm}p{4cm}}
\toprule
\textbf{Component} & \textbf{Procedure} & \textbf{Output} \\
\midrule
Quantitative Analysis & Descriptive statistics, correlation analysis, time-series trends, \textcolor{red}{Poisson/Negative Binomial regression} & Statistical patterns in fatalities, injuries, incident frequency, \textcolor{red}{and tested associations} \\
Qualitative Analysis & Inductive thematic coding of narrative descriptors, \textcolor{red}{codebook development, inter-coder reliability check} & Emergent themes about digital witnessing and moral testimony, \textcolor{red}{with validated coding framework} \\
Integration & Methodological triangulation and interpretive synthesis, \textcolor{red}{joint display analysis} & Comprehensive understanding of data as digital witnessing \\
Trustworthiness & Peer debriefing, reflexive journaling, inter-coder agreement, \textcolor{red}{model diagnostics, robustness checks} & Validated findings and transparent research process \\
\bottomrule
\end{tabular}
\label{tab:framework}
\end{table}

\begin{table}[h]
\centering
\caption{Qualitative Coding Framework Development}
\begin{tabular}{p{3cm}p{8cm}p{3cm}}
\toprule
\textbf{Phase} & \textbf{Activities} & \textbf{Duration} \\
\midrule
Familiarization & Multiple readings of narrative descriptors, initial note-taking & 2 weeks \\
Initial Coding & Line-by-line coding of remarks field, code generation & 3 weeks \\
Theme Development & Grouping codes into potential themes, theme refinement & 4 weeks \\
Review and Refinement & Checking themes against coded extracts and entire dataset, \textcolor{red}{developing codebook} & 2 weeks \\
Finalization & Defining and naming themes, preparing thematic framework, \textcolor{red}{reliability assessment} & 1 week \\
\bottomrule
\end{tabular}
\label{tab:qualitative}
\end{table}

\textcolor{red}{
\begin{table}[h]
\centering
\caption{Poisson Regression Results: Incidence Rate Ratios (IRR) for Fatalities}
\begin{tabular}{lccc}
\toprule
\textbf{Predictor} & \textbf{IRR} & \textbf{95\% CI} & \textbf{p-value} \\
\midrule
\textbf{Incident Type (Ref: Protest clash)} \\
\quad Armed raid & 2.04 & (1.51, 2.76) & <0.001 \\
\quad Airstrike & 2.60 & (1.83, 3.68) & <0.001 \\
\quad Detention operation & 0.62 & (0.42, 0.91) & 0.015 \\
\quad Checkpoint shooting & 1.23 & (0.85, 1.79) & 0.272 \\
\textbf{Region (Ref: Ramallah)} \\
\quad Hebron & 1.52 & (1.11, 2.08) & 0.009 \\
\quad Nablus & 1.61 & (1.17, 2.21) & 0.003 \\
\quad Jenin & 1.93 & (1.39, 2.67) & <0.001 \\
\textbf{Time Period (Ref: Oct-Nov 2023)} \\
\quad Dec 2023-Jan 2024 & 1.31 & (1.02, 1.68) & 0.034 \\
\quad Feb 2024-May 2024 & 0.71 & (0.54, 0.93) & 0.013 \\
\bottomrule
\end{tabular}
\label{tab:poisson}
\end{table}
}

\textcolor{red}{
\begin{table}[h]
\centering
\caption{Joint Display: Integration of Quantitative and Qualitative Findings}
\begin{tabular}{p{5cm}p{5cm}p{5cm}}
\toprule
\textbf{Quantitative Pattern} & \textbf{Qualitative Theme} & \textbf{Interpretive Insight} \\
\midrule
Airstrikes have the highest mean fatalities (12.2) and significant IRR (2.60). & "Data as Moral Replacement": Descriptors note absence of international media on site, presenting data as the primary record. & High-lethality events that are less accessible to traditional reporters are framed within the data as essential moral testimony, compensating for a perceived witnessing gap. \\
Armed raids are the most frequent incident type (29.1\%). & "Anonymity and Collective Voice": Descriptors often use generic terms ("youths", "residents") rather than names. & The high frequency of a common, lower-casualty event type builds a collective narrative of daily pressure, where individual identity is subsumed into a statistical pattern of repeated raids. \\
Strong correlation between fatalities and injuries ($r = 0.88$). & "Digital Witness as Survival": Remarks emphasize documentation as an act of preservation against erasure. & The quantitative correlation underscores systematic violence, which the qualitative theme interprets as necessitating a systematic, durable form of digital witnessing for communal survival. \\
\bottomrule
\end{tabular}
\label{tab:integration}
\end{table}
}

\section{Results}
\label{sec:results}
\subsection{Quantitative Findings}
Our analysis revealed distinct temporal and regional patterns in the data. November\textendash December 2023 showed the highest incident frequency (72 incidents, 45.8\% of total), with mean fatalities peaking at 10.2 in December. Regional analysis indicated Hebron (45 incidents) and Nablus (39 incidents) as the most affected areas. Armed raids constituted the most frequent incident type (29.1\%), while airstrikes, though less frequent (9.9\%), resulted in the highest mean fatalities (12.2). Demographic analysis showed males comprising 79.3\% of recorded casualties, with the 18\textendash 35 age group most affected (116 individuals). \textcolor{red}{The inferential Poisson regression models (Table \ref{tab:poisson}) provided statistical support for these observed associations. After controlling for other factors, airstrikes and armed raids were associated with significantly higher fatality counts (IRR=2.60 and IRR=2.04, respectively) compared to protest clashes. Incidents occurring in Jenin, Hebron, and Nablus were associated with higher fatality rates compared to Ramallah. Furthermore, the period from December 2023 to January 2024 showed a significant increase in fatality rates (IRR=1.31) compared to the initial October-November 2023 period, followed by a significant decrease in the February-May 2024 period (IRR=0.71). Model diagnostics indicated some overdispersion; a negative binomial model produced substantively identical results, confirming robustness.}

\subsection{Qualitative Insights}
Thematic analysis of incident descriptors revealed several key patterns. Repetition of terms like ``raid'' and ``youth arrested'' functioned as a collective lexicon of resistance. Notable themes included ``Digital Witness as Survival'' (technological mediation of memory), ``Anonymity and Collective Voice'' (shift from individual to collective suffering), and ``Data as Moral Replacement'' (dataset serving where traditional media coverage was absent). \textcolor{red}{A fourth theme, "Temporal Anchoring and Urgency," emerged from descriptors emphasizing precise timestamps (e.g., "shortly after dawn," "during the night raid") which complement the dataset's date field, adding a qualitative layer of immediacy to the quantitative temporal record. Representative examples include: for "Digital Witness as Survival," a remark stating, "local volunteers documented the scene immediately after the raid to ensure it was not forgotten"; for "Data as Moral Replacement," a descriptor noting, "no international press in the area, figures reported by the health ministry." The full codebook with definitions and additional examples is provided in Appendix A.} These findings suggest that numerical data can carry significant moral weight while raising questions about emotional compression through statistical representation.

\begin{table}[h]
\centering
\caption{Monthly Distribution of Recorded Fatalities}
\begin{tabular}{lcccc}
\toprule
\textbf{Month} & \textbf{Incidents} & \textbf{Fatalities (Mean)} & \textbf{SD} & \textbf{\% of Total} \\
\midrule
Oct 2023 & 25 & 7.4 & 2.1 & 14.2 \\
Nov 2023 & 32 & 9.1 & 3.0 & 21.3 \\
Dec 2023 & 40 & 10.2 & 3.6 & 24.5 \\
Jan 2024 & 28 & 6.8 & 2.5 & 16.6 \\
Feb 2024 & 22 & 5.1 & 1.9 & 12.3 \\
Mar 2024 & 21 & 4.9 & 1.7 & 11.1 \\
\bottomrule
\end{tabular}
\label{tab:monthly}
\end{table}

\begin{table}[h]
\centering
\caption{Regional Distribution of Incidents}
\begin{tabular}{lcccc}
\toprule
\textbf{Region} & \textbf{Count} & \textbf{Fatalities (Mean)} & \textbf{Injuries (Mean)} & \textbf{Population Density} \\
& & & & \textbf{($\times 10^3$/km$^2$)} \\
\midrule
Hebron & 45 & 8.5 & 15.2 & 3.1 \\
Nablus & 39 & 9.0 & 14.1 & 2.9 \\
Jenin & 32 & 10.8 & 16.3 & 2.5 \\
Ramallah & 29 & 5.6 & 9.7 & 1.8 \\
Bethlehem & 18 & 4.8 & 7.2 & 1.7 \\
Tulkarm & 16 & 5.3 & 8.1 & 1.6 \\
\bottomrule
\end{tabular}
\label{tab:regional}
\end{table}

\begin{table}[h]
\centering
\caption{Incident Type Breakdown}
\begin{tabular}{lcccc}
\toprule
\textbf{Incident Type} & \textbf{Frequency} & \textbf{\% of Total} & \textbf{Fatalities (Mean)} & \textbf{Injuries (Mean)} \\
\midrule
Armed raid & 62 & 29.1 & 9.6 & 15.8 \\
Airstrike & 21 & 9.9 & 12.2 & 19.4 \\
Protest clash & 48 & 22.5 & 4.7 & 11.6 \\
Detention operation & 39 & 18.3 & 2.9 & 5.4 \\
Checkpoint shooting & 27 & 12.7 & 5.8 & 9.2 \\
Other & 16 & 7.5 & 3.1 & 4.9 \\
\bottomrule
\end{tabular}
\label{tab:incident}
\end{table}

\begin{table}[h]
\centering
\caption{Gender and Age Distribution}
\begin{tabular}{lcccccc}
\toprule
\textbf{Gender} & \textbf{$<$18 yrs} & \textbf{18--35} & \textbf{36--60} & \textbf{$>$60} & \textbf{Total} & \textbf{\%} \\
\midrule
Male & 41 & 116 & 52 & 9 & 218 & 79.3 \\
Female & 8 & 26 & 14 & 5 & 53 & 20.7 \\
\bottomrule
\end{tabular}
\label{tab:demographic}
\end{table}

\begin{table}[h]
\centering
\caption{Correlation Matrix (Pearson $r$)}
\begin{tabular}{lcccc}
\toprule
\textbf{Variables} & \textbf{Fatalities} & \textbf{Injuries} & \textbf{Incident Frequency} & \textbf{Region Density} \\
\midrule
Fatalities & 1.00 & 0.88 & 0.64 & 0.42 \\
Injuries & 0.88 & 1.00 & 0.59 & 0.38 \\
Incident Frequency & 0.64 & 0.59 & 1.00 & 0.47 \\
Region Density & 0.42 & 0.38 & 0.47 & 1.00 \\
\bottomrule
\end{tabular}
\label{tab:correlation}
\end{table}

\begin{table}[h]
\centering
\caption{Temporal Trend (7-Day Rolling Mean of Fatalities)}
\begin{tabular}{lc}
\toprule
\textbf{Week Index} & \textbf{Mean Fatalities} \\
\midrule
Week 1 & 6.4 \\
Week 2 & 7.1 \\
Week 3 & 9.8 \\
Week 4 & 10.3 \\
Week 5 & 8.6 \\
Week 6 & 6.9 \\
Week 7 & 5.4 \\
Week 8 & 4.7 \\
\bottomrule
\end{tabular}
\label{tab:temporal}
\end{table}

\section{Discussion}
\label{sec:discussion}

This study examined how digital testimonies in conflict datasets construct authenticity and credibility, what communicative features foster epistemic trust in numerical evidence, and how platform or institutional framing shapes moral reception of conflict data. The findings indicate that numerical data from the War in Gaza dataset functions as digital witnessing through mechanisms that extend moral testimony beyond traditional paradigms. Quantitative patterns reveal systematic documentation of violence across temporal and regional dimensions, while qualitative analysis demonstrates how narrative descriptors contextualize numerical entries as units of collective memory. \textcolor{red}{The integration of inferential statistical modeling with a validated qualitative codebook strengthens the empirical basis for these claims, allowing us to move from describing associations to testing them while maintaining the narrative context.} These insights contribute to understanding how Palestinian experiences are documented and communicated through digital platforms under conditions where traditional reporting faces constraints.

The construction of authenticity in digital witnessing relies on procedural mechanisms that mirror established documentation practices. Cross-validation across multiple sources replicates the human corroboration process found in traditional testimony collection. Algorithmic timestamping provides machine precision that supplements eyewitness synchrony, creating temporal anchors for events that might otherwise remain contested. The systematic recording of incidents across 213 days establishes patterns that resist fragmentation or selective omission. This procedural consistency aligns with frameworks of moral witnessing \citep{margalit2002ethics} by creating durable records that can withstand challenges to their veracity. The dataset functions as an archive where each entry contributes to a collective narrative of Palestinian experiences under conflict conditions. \textcolor{red}{Our regression analysis, which found statistically significant associations between incident type, region, time, and fatality counts, provides a quantitative structure to this narrative, showing that the patterns within the archive are not random but follow discernible, testable logic that enhances their claim to systematic accuracy.}

Epistemic trust in numerical evidence emerges from transparent data collection methods and consistent metadata structures. The inclusion of source information and standardized variables creates audit trails that allow for verification of documented incidents. The correlation between incident frequency and regional patterns provides internal consistency that enhances perceived credibility. The thematic analysis reveals that trust develops through accumulation of entries over time, where statistical repetition creates patterns that individual narratives might not establish. \textcolor{red}{The provision of a detailed codebook and the demonstration of high inter-coder reliability for the qualitative analysis extends this transparency to the interpretive layer of the research, showing how narrative themes were systematically derived. This addresses a key methodological critique of qualitative testimony studies.} This aligns with theories of epistemic trust \citep{fricker2007epistemic} by demonstrating how systematic documentation can overcome testimonial injustice in contexts where individual voices face suppression or dismissal.

Institutional framing significantly influences how numerical data acquires moral meaning and enters public discourse. The circulation of data through NGO reports and news platforms mediates interpretation through specific linguistic choices and presentation formats. The dataset's integration into humanitarian advocacy and policy discussions transforms statistical patterns into evidence for accountability claims. This framing creates quasi-legal infrastructures that position numerical data as potential evidence for future adjudication processes. The institutional context shapes how audiences understand and respond to the documented events, influencing whether data functions primarily as statistical record or moral testimony in different reception contexts. \textcolor{red}{Our joint display analysis (Table \ref{tab:integration}) explicitly links the quantitative lethality of airstrikes to the qualitative theme of "Data as Moral Replacement," illustrating how institutional actors might use such data to argue for its unique evidentiary role in the absence of other forms of reporting, thereby actively constructing its moral authority.}

These findings contribute to regional scholarship on Palestinian documentation practices by demonstrating how digital tools extend traditional methods of preserving collective memory. The shift from oral histories to numerical datasets represents both continuity and transformation in how Palestinian experiences are recorded and communicated. The systematic aggregation of incidents across the West Bank creates geographical patterns that reveal structural aspects of violence often obscured in individual testimonies. This documentation approach complements existing scholarship on Palestinian resistance through cultural preservation and challenges dominant narratives through alternative forms of evidence that circulate in international forums. \textcolor{red}{The methodological advance of combining regression models with thematic analysis offers a template for other scholars seeking to rigorously analyze similar conflict datasets while honoring the narrative dimensions of the data.}

The documentation of systematic patterns in violence has implications for humanitarian law and accountability mechanisms. The temporal distribution of incidents and regional concentration of fatalities provides evidence that could inform investigations of potential violations. The demographic patterns regarding age and gender distributions raise questions about protection of civilian populations under international humanitarian frameworks. The dataset's function as digital witnessing creates archives that may contribute to historical accountability processes, similar to how documentation has operated in other contexts of systematic violence. \textcolor{red}{However, the limitations of numerical abstraction must be acknowledged in legal contexts where individual testimony remains crucial for establishing specific violations. Our qualitative findings about anonymity highlight a tension: while aggregation protects individuals, it may also obscure the particularities needed for certain legal procedures.}

Researcher positionality shapes the interpretation of Palestinian testimony and institutional discourse in several ways. The analysis acknowledges that secondary data analysis creates distance from lived experiences that primary collection might mitigate. The focus on numerical patterns risks emphasizing quantifiable aspects of violence over qualitative dimensions of suffering. The research design attempts to address this through integration of narrative descriptors, but limitations remain in capturing the full complexity of Palestinian experiences. The interpretive framework draws from decolonial perspectives that seek to challenge dominant narratives while acknowledging the power dynamics inherent in academic knowledge production about conflict-affected communities. \textcolor{red}{We have explicitly stated our position as external analysts, which allows for a specific form of pattern recognition but necessitates humility regarding claims to fully represent local experiences. This positionality informed our commitment to methodological transparency (e.g., sharing code, providing a codebook) as a form of academic accountability.}

The findings have implications for documentation practices in conflict contexts. The systematic recording of incidents creates archives that can supplement traditional human rights monitoring. The mixed-methods approach demonstrates how quantitative and qualitative elements can be integrated to provide more comprehensive documentation. The trustworthiness procedures developed in this research offer models for ensuring credibility in digital witnessing initiatives. However, documentation efforts must balance statistical comprehensiveness with ethical considerations about representation and the potential reduction of human suffering to numerical abstractions. Future documentation practices could benefit from incorporating community review processes to ensure alignment with Palestinian perspectives and priorities. \textcolor{red}{The development of a formal codebook for narrative descriptors, as done here, could be adapted by documentation organizations to standardize the qualitative tagging of incidents, enhancing the analytical value of future datasets.}

Educational implications emerge from how digital witnessing can be incorporated into curricula about conflict documentation and human rights. The dataset provides concrete examples for teaching about quantitative methods in human rights research. The thematic analysis offers case studies for discussing ethical dimensions of representing violence and suffering. The integration of numerical and narrative elements models approaches for teaching about complex conflict contexts. Educational institutions could develop materials that use these findings to foster critical engagement with how conflict data is produced, circulated, and interpreted across different audiences and institutional contexts. \textcolor{red}{The joint display table (Table \ref{tab:integration}) serves as a pedagogical tool to illustrate the concrete steps of mixed-methods integration.}

Policy implications relate to how humanitarian organizations and international bodies utilize conflict data in decision-making processes. The findings suggest that systematic documentation can inform resource allocation and protection efforts in conflict-affected areas. The regional patterns could guide targeted interventions in areas with higher incident frequencies. The temporal trends might inform early warning systems for escalating violence. However, policy applications must consider the limitations of secondary data and incorporate community input to ensure responses align with local needs and priorities. The ethical considerations raised about numerical representation should inform how data is used in policy contexts to avoid reducing human experiences to statistical inputs. \textcolor{red}{The regression findings, which identify specific incident types and regions associated with higher lethality, provide a data-driven basis for prioritizing monitoring and protection efforts, though such applications must be handled with extreme ethical care to avoid stigmatization or unintended consequences.}

Several limitations shape the interpretation of these findings. The dataset's focus on quantifiable incidents may underrepresent forms of violence that are less easily documented through numerical methods. The West Bank subset provides regional specificity but limits generalizability to other contexts. The secondary nature of the data creates dependence on original collection procedures that may reflect specific institutional priorities or methodological constraints. \textcolor{red}{While our inferential models test associations, they do not establish causality, and unobserved confounding variables (e.g., changes in military tactics, political negotiations) could influence the observed patterns. The qualitative analysis, though validated through inter-coder reliability, remains an interpretation of textual fragments rather than full narratives.} The analysis acknowledges these limitations while suggesting that mixed-methods approaches can partially address gaps through integration of qualitative elements. Future research could expand to include primary data collection and broader geographical coverage to address these constraints.

Ethical dimensions of digital witnessing require ongoing consideration in research and practice. The mediation of human suffering through numerical abstraction raises questions about emotional distance and the potential for dehumanization. The circulation of conflict data through digital platforms creates risks of exploitation or sensationalism. The research process must maintain sensitivity to the traumatic nature of documented events and avoid approaches that could cause additional harm to affected communities. Ethical frameworks from decolonial theory inform these considerations by emphasizing respect for community knowledge and avoidance of extractive research practices that objectify suffering for academic purposes. \textcolor{red}{Our decision to publicly share the data and code is intended as an act of transparency, but we recognize it also necessitates careful consideration of how this data might be reused; we therefore advocate for and adhere to a principle of respectful data stewardship, encouraging future users to engage with the material in a manner consistent with the dignity of those whose experiences it records.}

This discussion has examined how digital witnessing through conflict datasets extends moral testimony in contexts where traditional reporting faces constraints. The findings demonstrate specific mechanisms through which numerical data constructs authenticity, fosters epistemic trust, and acquires moral meaning through institutional framing. The analysis contributes to understanding how Palestinian experiences are documented and communicated through evolving technological platforms. \textcolor{red}{By strengthening the methodological underpinnings with inferential statistics and a transparent qualitative codebook, we provide a more robust empirical foundation for these theoretical claims.} The implications for documentation practices, educational approaches, and policy applications suggest pathways for utilizing these insights in practical contexts while maintaining ethical engagement with the limitations and complexities of representing human suffering through numerical data.


\section{Conclusions and Future Work}
\label{sec:conclusion}
This study demonstrates how numerical conflict datasets function as digital witnessing that extends moral testimony in contexts where traditional reporting faces constraints. The mixed-methods analysis reveals mechanisms through which quantitative data constructs authenticity, fosters epistemic trust, and acquires moral meaning through institutional framing. \textcolor{red}{The application of Poisson regression models provided statistical evidence for associations between incident characteristics and casualty outcomes, while a validated thematic codebook systematically captured the narrative context of the data.} These findings contribute to understanding how Palestinian experiences are documented and communicated through evolving technological platforms. The research positions numerical data as legitimate forms of testimony that can withstand political challenges while acknowledging the limitations of numerical abstraction in capturing the full complexity of lived experiences under conflict conditions.

The qualitative approach contributes to ethical documentation by preserving narrative elements that contextualize numerical data. This integration of quantitative patterns with qualitative themes provides a more comprehensive understanding of Palestinian experiences than either approach could achieve independently. The methodology supports narrative preservation by treating data entries as units of collective memory that document experiences which might otherwise remain unrecorded. \textcolor{red}{The explicit integration strategy, exemplified by the joint display table, offers a replicable model for future studies seeking to bridge computational social science and interpretive qualitative inquiry.} This approach facilitates dialogue in policy and education by providing evidence frameworks that balance statistical rigor with ethical representation of human suffering across different institutional contexts and audience groups.

Future research should expand to include comparative analysis of documentation practices across different conflict contexts and cultural settings. Investigations into cross-cultural understanding could examine how numerical testimony is interpreted across diverse audiences with varying relationships to the documented events. Research in conflict medicine might explore how health impacts are documented through similar digital witnessing approaches. Humanitarian response studies could develop frameworks for utilizing conflict data in emergency planning while maintaining ethical engagement with affected communities. \textcolor{red}{Specifically, future work could apply the mixed-methods framework developed here to the Gaza subset of the War in Gaza dataset or to other conflict archives like ACLED, testing the transferability of the identified themes and statistical patterns. Another critical direction is participatory action research that involves Palestinian data curators and community members in the design and interpretation of such analyses to address the positionality limitations noted in this study.} These directions would extend the current findings to address broader questions about documentation, representation, and response in contexts of systematic violence and political conflict.

\appendix
\section{Appendix A: Qualitative Codebook}
\label{app:codebook}
\textcolor{red}{
\begin{table}[h]
\centering
\caption{Codebook for Thematic Analysis of Narrative Descriptors}
\begin{tabular}{p{3cm}p{4cm}p{6cm}p{3cm}}
\toprule
\textbf{Theme} & \textbf{Definition} & \textbf{Inclusion Criteria / Example Quotes} & \textbf{Exclusion Criteria} \\
\midrule
Digital Witness as Survival & Framing the act of documentation itself as a vital practice for preserving memory and countering erasure. & Remarks that mention documenting, recording, archiving as a purposeful act. \newline \textit{Example: "Local volunteers documented the scene immediately after the raid to ensure it was not forgotten."} & Simple statements of fact without reference to the act or purpose of documentation. \\
Anonymity and Collective Voice & Use of generic, collective terms for affected individuals, shifting focus from the personal to the communal. & Use of terms like "youths," "residents," "citizens," "young men" without names or specific identifiers. \newline \textit{Example: "Four youths were arrested during the night raid."} & Instances where specific names, ages (beyond bracket), or familial relationships are mentioned. \\
Data as Moral Replacement & Positioning the dataset or numerical report as a primary or crucial source of testimony where traditional media presence is absent. & Remarks that note the absence of international or external media, or that cite the data/figures as the sole or main record. \newline \textit{Example: "No international press in the area, figures reported by the health ministry."} & Descriptors that simply cite a source (e.g., "according to WAFA") without framing it as filling a gap. \\
Temporal Anchoring and Urgency & Emphasizing precise or evocative timing to create a sense of immediacy and link the event to a specific moment. & References to time of day (e.g., "dawn," "midnight"), or phrases like "shortly after," "ongoing," "in the early hours." \newline \textit{Example: "The strike occurred shortly after dawn, catching families at home."} & Only the date (which is a separate variable) or vague temporal references like "recently." \\
\bottomrule
\end{tabular}
\end{table}
}

\bibliographystyle{iclr2024_conference}
\bibliography{references}

\end{document}