AI-generated answers to some questions about software litigation and AI

Following this introduction, the text below comes from a chatbot created using Google’s NotebookLM (NBLM). The chatbot acts as a front-end to this web site, and is based on about 150 web pages from SoftwareLitigationConsulting.com. You can access the chatbot at: https://notebooklm.google.com/notebook/b659034d-b640-4385-a572-e27ce2df4962/preview
NBLM is an LLM+RAG (Retrieval Augmented Generation). While, like LLMs generally, it has been trained on huge amounts of text, its answers to questions/prompts are based on a much smaller set of specific documents (here, the contents of SoftwareLitigationConsulting.com). For more on RAG, see e.g. Kimothi’s book A Simple Guide to Retrieval Augmented Generation.
It’s an odd experience using a RAG like NBLM to generate new text from one’s own writing. I’ve uploaded my writing into NBLM; NBLM answers questions based on my writing; but what it produces is largely brand-new writing, based on mine but different. It often pulls pieces from multiple documents that I wrote at different times.
A RAG chatbot, at least as I’m using it here, feels like machine for producing derivative works. (At least loosely speaking: NBLM’s expression generally isn’t mine, except when it explicitly quotes phrases from my writing.)
Does it “understand” the material enough to do a competent job?; Can it summarize usefully? (see other pages at this site about AI summarization issues); How much BSing does it do, to produce decent-looking output, without actually engaging with the contents?; Can it surface low-level details when discussing higher-level topics? (NBLM seems to have a general preference for high-level descriptions without proper nouns, quotations, specifics)
It is difficult to directly compare this NBLM experience with ChatGPT and Claude Projects, which are limited to fewer uploaded docs than the 150 I’m relying on here. A comparison is possible with my “chatbook” for a forthcoming Patent Litigation book, where my core material is a much smaller number of very large docs (rough drafts and rough notes). This site’s home page links to both NBLM and ChatGPT “chatbooks” on Patent Litigation. Actually, the challenge here is easier than with the PatLitig chatbook: there, NBLM has a collection of very rough drafts and notes, and can answer questions in polished prose, based on those rough notes. Here, NBLM has a set of web pages from a public web site which, while not well organized, at least contains material that was sufficiently “ready for prime time” for me to “publish” it on the web.
I’ve hesitated posting NBLM’s output, since there’s a potential for self-regurgitated slop. While I am refraining from feeding this web page back into NBLM as a “Source” file, nonetheless at some point the Google web crawler will find this page, and it will be my own small contribution to the problem of AIs consuming their own fumes. Perhaps adding my comments and critiques in [bold italics] helps.
Slop: what happens when one starts feeding NBLM output, back in as an NBLM source? (NBLM makes fairly easy to do this.) It seems that if NBLM output is fed back in as a source, it ends up getting preferential treatment in future queries — not from any deliberate policy, but rather because the NBLM output is higher-level, more “cooked” than the original source files, and so becomes an easier source for answering questions, helping NBLM avoid digging into lower-level raw material (see forthcoming “self-referential” page for specific example).
It may seem there is no need to post such answers at all, since one can always ask the chatbot for them, but I at least have trouble critically assess the quality of chatbot answer without having something in fixed form. My purpose in posting this material is seeing how it sits as potential first-draft material for my book writing. How close or far is this from material that could be used in a book? While right now I’m simply dumping in some answers, and am going to critically stare at them for a while, likely I should be using this experience to better craft prompts to elicit the sort of material I want, and to help figure out how to work faster, closer to AI pace, without becoming overly-accepting of its output.
NBLM’s responses are far from perfect, but they give me something to react to. It is way better than staring at a blank page. And can ask it, over & over, to iterate towards better answers (given LLM “temperature,” even asking the same identical question multiple times is likely to generate different answers). Compare this with the experience of writing entirely by oneself. See Ethan Mollick article on ChatGPT-5, Ezra Klein on ChatGPT-5. See Ethan Mollick on “good enough prompting” and “good enough pitfalls” (““As soon as the AI model is good enough, everyone tends to fall asleep at the wheel”).
Is NBLM’s output productive? Even if sometimes wrong, is it likely to spawn further questions, “delving” (to use AI’s apparent favorite word), or is it more likely a dead-end?
An odd kind of assistant: on another page here (on “self-referential” test), I will be asking questions like “Given an AI doesn’t understand what it is talking about, is there any reason to expect meaningful prose?” (perhaps surprising answer is Yes), “What weird sort of writing is this with RAG?” (note difference between LLM and LLM+RAG; with standard LLM, it is harder to trace a connection between its output and any specific set of sources; with LLM+RAG we know where the underlying material is coming from)
See “Why so much here on NotebookLM?” section at end of this page.

[TOC]

What’s this web site about?; what are some surprising or odd things here?

Wanting to test whether an LLM+RAG like Google’s NotebookLM (NBLM) would synthesize useful conclusions, from multiple sources (here, the contents of this web site, SoftwareLitigationConsulting.com), I asked it:

“Are there 3 or 4 underlying principles that tie together, or support, this site’s coverage of reverse engineering, source-code examination, AI internals/implementation, patent litigation (especially use of patent claims), and expert evidence? Are these just a bunch of different interests of the author, or are there some more fundamental things that tie them together?”

In response, NBLM shot back a description of “four underlying principles that connect” the material at this site, including “The Principle of Textual Alignment and Disaggregated Comparison” and “The Principle of Factual Granularity and Pinpointing”.

This wasn’t bad at all, and was way better than staring at a blank page, but in some way I couldn’t put my finger on, it wasn’t what I wanted. For one thing, it hadn’t picked up on what I think of as the dominant theme here, which is the importance of small low-level details on the one hand, to “big picture” conclusions on the other hand. So I followed up with:

“This is useful, but feels a bit forced or contrived. Could you try again? For example, one might emphasize how disaggregated comparison (e.g. between patent claim limitations) relates to the use of “features” in AI/ML-based classification? Something like that?”

NBLM responded noting, “The previous principles were accurate but focused more on the legal and practical necessity of the tasks.” Which is something I hadn’t raised, but which was indeed a problem with its previous response. It then presented me with an outline of “four fundamental principles [that] tie together reverse engineering, source-code examination, AI implementation, patent litigation, and expert evidence.” These inferred “principles” included “Disaggregation as Universal Feature Engineering”, “Feature Mapping via Semantic Embeddings”, “The Centrality of Structural Features and Implementation”, and “Expert Validation of Extracted Features (Reliability)”. Because NBLM is an LLM+RAG, all this was based on my supplied material (this web site). It might actually be a good summary of my major themes that I’m only half-conscious of, but given how good the summary looked, it was difficult for me to critically assess it. So I tried to simplify:

Me: This is much closer to what I was looking for. Let me also ask a different question: What is one sentence, or at most 2 sentences, that ties together this material at the site, in a way that emphasizes underlying methodology or approach (as opposed to the “legal and practical necessity of the tasks” as you put it)?

NBLM: The underlying methodology unifying software analysis and AI interpretability is the non-holistic, disaggregated feature analysis, which requires technical experts to break down complex systems (code or neural networks) into discrete structural elements (SPAFFs) and establish their specific locations to match legal requirements. This approach relies on a verifiable methodology, demanding active skepticism and independent testing (e.g., against random baselines) to ensure that all extracted features are reliable evidence of implementation, not mere conjecture or architectural artifacts.

(Here, “SPAFF” came from an overview at this site of some source-code cases: “Under LPR 3-1(c), P’s claim chart must refer to at least one ‘structure, process, algorithm, feature or function [SPAFF] of any accused product’ (citing Connectel)”, referencing Michael Sutton v. Nokia, 2009.)

Well, this answer wasn’t quite it either. It feels like undergrad BS, stringing together the phrases one might expect to see in a good answer, but without much sense in how what the phrases stand for actually relate. So I tried something even simpler:

Me: Tell me one surprising, odd, or interesting thing from this web site.

NBLM: One surprising and technologically interesting thing discussed on the site is the demonstrated ability of an AI chatbot to reverse engineer software from a binary file from the 1980s.

Specifically, Anthropic’s Claude AI successfully reconstructed the complete working C source code of a 35-year-old DOS utility (MAKEDBF), starting only from the original executable file. Claude performed this by treating the process like “solving a giant Sudoku puzzle,” using internal strings, error messages, and help text found in the binary as constraints to piece the code back together through a technique it called “constraint-based reconstruction”. This capability was viewed by some as an emergent “spark of AGI”.

NBLM got this from “Claude 3.7 turns 1988 and 2015 binary EXEs into working C code (and both exaggerates and undersells its own reverse engineering skills)“. While it’s hard for me to be objective about what is surprising or odd or interesting about my own web site, this feels like NBLM made a good selection. The “viewed by some as an emergent ‘spark of AGI‘” sounds over the top, but does reflect something in the web page.

Me: Nice. Please give me a few other examples like that, not necessarily AI related, but with some of the same nature as the example you just gave.

NBLM: The element of surprise in the MAKEDBF example (reconstructing source code using “constraint-based reconstruction” based on external knowledge and binary strings, rather than disassembly) is mirrored in several other non-AI-assisted reverse engineering and source-code examination techniques, which reveal hidden internal facts from non-obvious external clues: [NBLM did not link to specific web pages; I’ve added links below]

Finding Binary Clones using Opstrings: Experts have developed methods to automatically compare compiled binary code (like DLLs or Win32/x86 executables) without access to the source code. This is achieved by creating low-level “fingerprints” or “function digests” using horizontal sequences of disassembled code called “opstrings” [Part 2, Part 3]. This technique allows analysts to pinpoint the migration or “commingling” of code between different files or versions, even for proprietary software.
Tracing Execution from Visible Error Messages: In litigation, an investigator can take a visible artifact—such as an error message displayed by the accused product—and use dynamic reverse engineering tools (disassemblers and debuggers) to trace backward to the exact lines of object code that triggered the message. This allows an assessment of whether the error was technically necessary, a bug, or a deliberate incompatibility. [The reference to deliberate incompatibility suggests NBLM based this on the “AARD” error message that featured in Microsoft antitrust litigation. There are only some scattered references at this web site (including one on relating a vendor’s code on the one hand with its internal documents on the other); interesting these were apparently sufficient for NBLM to surface this reference. See Wikipedia article on AARD and “Examining the Windows AARD Detection Code“.]
Measuring Latent Feature Usage with Kernel Drivers: For operating system or complex application software, investigators have designed and written software, sometimes using Windows NT kernel device drivers or specialized external hardware, to measure the usage or non-usage of specific litigated microprocessor special flags or features by each process running on the system. This highly focused, low-level dynamic analysis determines if a feature, even if “latent” in the source code, is actually being utilized. [I couldn’t at first think what NBLM was referring to here, so I asked it for more details. It mentioned “device drivers to help attach an external Tektronix counting device to Pentium debug registers.” That Tektronix reference was sufficient to jog my memory. This comes from the list of projects here, which includes: “In antitrust case, wrote device drivers to help attach external Tekronix [sic; NBLM silently corrected my misspelling] counting device to Pentium debug registers, to measure usage or non-usage of operating-system features.” I think the original idea came from Prof. Lee Hollaar. I had a program that used single-step interrupts to count use or non-use of OS features; there is something related in the INTCHAIN program in Undocumented DOS, 2nd edition: “A blast from the past (1994?!): Disassembling DOS“]
Recovering “Quasi-Source” via Decompilation: For certain virtual-machine languages, such as Java, Flash ActionScript, or Microsoft .NET code, the executable products can be decompiled into a “close-enough representation of the source code”. This ability to obtain quasi-source from object code provides high-level human-readable code, often revealing structural details that are critical for an element-by-element analysis, bypassing the extreme difficulty typically associated with native C/C++ binary disassembly. [As NBLM describes it here, this is the least interesting of the four examples. It sounds like a description of decompilation, which would once have been amazing, but has become standard, especially for bytecodes, but with NSA Ghidra is even available for machine code. But NBLM could have made this more surprising or interesting if it had pointed specifically to the work of the SRI Lab at ETH Zurich, briefly discussed at this site, using “Big Code” in Java decompilation and in JavaScript de-obfuscation.]

Me: Now take each of the 5 examples you have surfaced (the 4 numbered ones, plus the initial “MAKEDBF” example), and give me one or two sentences each on what higher-level principle the example is related to.

NBLM:

0. MAKEDBF Reconstruction (AI Rebuilding C from Binary): This process exemplifies the principle of Inferring Structure from Functional Constraints, where domain knowledge and external textual clues (strings, error messages) are used by AI to reconstruct the implementation details (“way”) necessary to satisfy all observable constraints, effectively bridging massive information gaps.

1. Opstrings and Function Digests (Binary Cloning): This relates to the principle of Structural Feature Disaggregation and Normalization, where the complex technical material (binary code) is reduced to minimal, invariant feature sequences (opstrings/mnemonics) to enable scalable, non-holistic comparison and identification of identical implementations across different files.

2. Tracing Execution from Error Messages (Dynamic Pinpointing): This directly supports the principle of Factual Granularity and the “Location Where” Requirement, leveraging a visible artifact or run-time behavior (the error message) to definitively pinpoint the exact, low-level object code implementation that generated it, satisfying the high standard of specificity required in claim charts.

3. Measuring Latent Feature Usage with Kernel Drivers (Dynamic Analysis): This technique adheres to the principle of Verifiable Implementation (Way) Over Latent Structure, using dynamic testing to confirm that a specific, low-level feature (like a microprocessor flag) is actively executed and operational in the final product, rather than existing merely as non-executed code.

4. Recovering “Quasi-Source” via Decompilation (VM Languages): This demonstrates the principle of Reducing the Analytical Gap for Factual Scrutiny, exploiting characteristics of virtual machine code (Java, .NET) to rapidly generate human-readable source code, making the underlying structural implementation features immediately available for element-by-element comparison and expert analysis.

[Some of this sounds BS-ey, especially #4, but basically pretty good, something to work with. Is it realistic to expect more than being given something to work with?]

Lots of AI-related questions one can ask

Me: What are some AI-specific questions I can ask here?

NotebookLM chatbot, using ~150 sources from SoftwareLitigationConsulting.com: Based on the sources and our conversation, you can ask a wide range of AI-specific questions covering its capabilities, architecture, limitations, interpretability, and practical applications in fields like software reverse engineering and patent litigation.

Here are some AI-specific questions you can ask, categorized for clarity:

AI Capabilities and Applications

• How can AI models like Claude analyze numeric data, and is this ability a natural extension of next-token prediction or something new layered on top?

• What unique skills do AI systems possess, particularly in associative pattern matching, that go beyond what a human or even traditional computational methods can achieve?

• How can AI be used for “Code Mining” or “Big Code” to extract useful information from publicly accessible commercial software files, such as indexing contents for prior art searches or finding security vulnerabilities?

• What are the implications of AI’s “le tout ensemble” (holistic, system-level) optimization approach for the future of LLM inference?

• How can AI tools be applied to patent litigation tasks, such as:

◦ Drafting claim charts for infringement and invalidity contentions, and initial drafts of expert reports?

◦ Mining patent office datasets and PTAB data, linking them to patent claims to assess what is unique about litigated claims?

◦ Comparing litigated versus non-litigated patents across classifications and claim characteristics?

◦ Finding potential infringement or non-infringement using semantic similarity between claims and product-related text?

• How can AI generate working Python code for neural networks (e.g., for regression tasks) and assist in debugging, fixing problems, and extending the code?

• How can AI help create checklists of issues to consider when comparing two pieces of similar code, or when analyzing binary executables?

• What are the uses of patent mining based on claims and patent searching based on semantic similarity of claim language?

• How can AI facilitate patent “landscaping” using patent claim sentence embeddings to map claims into a vector space and identify “holes” (unclaimed technological areas)?

• How can Masked Language Modeling (MLM) be used to assess patent claim novelty or non-obviousness by testing the predictability of masked tokens?

• What are the practical applications of linear arithmetic on embeddings and difference vectors for patent analysis?

• How can LLMs assist in tracing the flow of control, detecting missing code, or changes between versions in multi-file source code projects?

• How well can LLMs answer questions about whether a particular code construct is unusual or common?

AI Interpretability and Explainability (xAI)

• Which interpretability/explainability techniques (such as SHAP, LIME, Layer-wise Relevance Propagation, Integrated Gradients, PCA, t-SNE, UMAP, clustering, activation, saliency maps, feature-attribution) are most likely to reveal patterns in how a neural network learns mathematical functions?

• What is the gap between interpreting a network’s processing and getting explanations that align with human mathematical thinking, and do hybrid approaches (e.g., incorporating symbolic reasoning with neural learning) offer promise to bridge this gap?

• How can interpretability techniques give us some view of intermediate concepts learned by a neural network, even if not fully human-interpretable?

• How can we compare interpretability results against a baseline from random data to distinguish actual learning patterns from architectural artifacts?

• What are the limitations of current interpretability techniques in revealing exactly how networks represent mathematical relationships?

• Why can’t backpropagation weights be directly used to interpret a model?

• How does error correction during training relate to the emergence of organized computational strategies in neural networks?

• What is the relationship between interpretable AI, explainable AI, and alignment on one hand, and software reverse engineering on the other?

• If AI can help us reverse engineer the software we create, how long will it be until we can effectively use AI to reverse engineer the thinking processes within other AI models, and what would that transparency reveal about the true nature of intelligence?

AI Architecture and Training

• How does network architecture influence the kinds of mathematical relationships that a neural network can learn?

• How do “instruction fine-tuning” methods (like FLAN, InstructGPT, Constitutional AI) enable AI models to understand and follow natural language instructions for using external tools and generalizing to new tasks?

• What is the difference between DeepSeek’s approach (training a neural network for computation and reasoning) and Anthropic Claude’s (a language model designed to call external APIs with fine-tuning and user context)?

• How does the Mixture of Experts (MoE) architecture enable scaling models without significantly increasing compute, and how does it impact learning complex tasks and behaviors?

• What are Rotary Positional Embeddings (RoPE) and how do they encode token positions within a Transformer model?

• How does tokenizer.apply_chat_template() prepare input prompts for chat-style interaction, and where can custom chat templates or extended prompts be provided?

• To what extent are “reasoning” and the ability to show a “chain of thought” emergent properties of training, versus requiring explicit code?

• What is the significance of the “768” dimension in embeddings (n_embd) and its relationship to max_tokens in LLMs?

• How do Transformers differ from plain Multi-Layer Perceptrons (MLPs) in terms of their attention mechanism, multi-head attention, positional encoding, residual connections, scale, training objectives, and contextual word embeddings?

• How do Transformers and attention mechanisms manage conversations without direct recurrence, instead using the entire conversation history as input?

• Why is “attention” considered “all you need” in foundational papers, and how central is it to modern AI?

• What is Low-Rank Adaptation (LoRA) and how does it efficiently fine-tune large language models?

• What is the relationship between next-token prediction training and the emergence of logical abilities in LLMs?

AI Limitations and Challenges

• What are the limitations of current AI models in providing specific sources or citations for their answers, and what are ongoing efforts to improve this?

• To what extent do AI chatbot answers about their own implementation reflect actual “introspection” versus generalized descriptions based on their training data?

• How does the “idiot savant” phenomenon manifest in AI chatbots (producing brilliant answers alongside simple mistakes) and what factors contribute to it?

• Why do AI chatbots tend to be overly agreeable or sycophantic, and how can a user prompt them to provide more critical feedback or tell them when they are wrong?

• What are the challenges and risks associated with AI summarization, including accuracy issues, determining importance, overgeneralization, and glossing over critical details?

• Can LLMs truly “understand” what they are talking about, or is their apparent comprehension more akin to “Potemkin Understanding”?

• How do errors in AI chatbot speculation or overstatements (e.g., about code features or methodologies) undermine confidence in general conclusions, and how should users handle such instances?

• How should an expert assess the level of reasonable trust in AI output generated largely with next-token prediction for formal reports?

• What are the specific challenges of running smaller local LLM models (e.g., DeepSeek-R1-Distill-Llama-8B), such as slow response times, verbose or incoherent chain of thought output, and difficulty with context shifts?

• How does an AI’s “knowledge” of its own implementation compare to its knowledge of other subjects, given that it lacks personal experience and access to its internal code?

• Is “emergent introspection” likely to happen in AI, and what are the limitations to developing a self-model or a feedback loop for genuine self-awareness?

• Does analyzing obfuscated code increase the risk of AI hallucinating even more convincingly, and how can experts mitigate this risk in critical applications like legal cases?

• What are the implications for AI safety if AI’s specificity in certain capabilities comes with surprising or dangerous limitations in related areas?

• How does the “temperature” setting in LLMs, along with top-p and top-k sampling, affect the variety and certainty of their output, and how can these parameters be tweaked?

• If an LLM is fed known-incorrect statements in a prompt, how does it typically respond—does it just go along, or does it push back or add nuance?

• What is the “multiplicity problem” in patent litigation, and what challenges does it present for litigants?

AI and Code/Reverse Engineering

• How do AI systems like Google Gemini examine, summarize, and compare source-code files, including heavily-obfuscated ones?

• How do AI systems handle disassembled and decompiled code listings, and how do they infer higher-level functionality (e.g., regression models) from low-level details without explicit mentions?

• How can AI identify common software design patterns in binary form, and how might this be used for “Big Code” applications like finding prior art or security holes?

• How can AI reconstruct working code (e.g., C/C++ or Python) directly from binary executables, using approaches like “constraint-based reconstruction” without disassembly or decompilation?

• What are the limitations of AI in initial binary reverse engineering (e.g., cannot perform disassembly itself, may miss key features without nudging)?

• How can AI be prompted to proactively search for unusual components or key features when examining binary files?

• To what extent should one trust an AI’s explanation of how it performed a reverse engineering process, given its acknowledged lack of true introspection?

• How can AI productively use text strings in binaries, disassembly listings, and mangled decompilation listings in reverse engineering?

AI “introspection”

Me: To what extent do AI systems genuinely “reason” or “introspect” into their own processes, as opposed to merely performing next-token prediction, and how should “chain of thought” outputs be interpreted in light of these underlying mechanisms and potential inaccuracies?

NotebookLM chatbot, using ~150 sources from SoftwareLitigationConsulting.com: AI systems operate primarily on the fundamental mechanism of next-token prediction. While this core mechanism is powerful enough to produce surprisingly good results, even leading to what appears to be creativity due to sampling among probable tokens, the extent to which they genuinely “reason” or “introspect” into their own processes is nuanced and often limited.

AI Reasoning vs. Next-Token Prediction:

While next-token prediction forms the bedrock of LLM operations, reasoning-like behavior has been actively engineered and encouraged through advanced training methodologies. Initially, reasoning and “Chain of Thought” (CoT) were often described as “emergent properties” arising from the vast scale and complexity of neural networks and their training datasets. However, this view has been refined: while the capacity for complex reasoning might emerge from scale and architecture, the specific style and display of reasoning are primarily acquired through explicit training. This includes techniques like:

Instruction Fine-tuning: Training models to understand and follow natural language instructions for tasks, including tool use.
Reinforcement Learning (RL): Specifically, models are trained with human feedback (RLHF) or self-improvement processes to encourage desired behaviors, including reasoning and producing structured outputs. For example, DeepSeek-R1 explicitly aims to “incentivize reasoning capability” as a primary goal of its training.
Knowledge Distillation: Transferring reasoning abilities from a more specialized model (like DeepSeek-R1) to a more general model (like DeepSeek-V3) by using the specialized model’s outputs to generate training data for fine-tuning the new model. This is described as “guided emergence” or a “deliberately transferred” capability.
Tool Use: Models like Anthropic’s Claude can have explicit access to deterministic external tools (e.g., JavaScript REPL) to perform computations, differentiating “reasoning” (understanding concepts and analysis methods) from “performing numerical operations”. The choice of when to use these tools is an emergent capability.

AI Introspection:

AI systems, including major LLMs like Claude and Gemini, do not genuinely “introspect” into their own internal processes, source code, weight matrices, or hyperparameters. Their “knowledge” about their own implementation or how they work is largely based on the vast amount of textual and code-based information about AI, machine learning, and language models contained within their training datasets.

Claude explicitly warns that AIs cannot simply “introspect” into their own implementation and respond based on general training rather than internal knowledge.
Gemini similarly states it has “no special, privileged window into my internal code, weights, or training process” and that its “knowledge” of its own implementation is not qualitatively different from its knowledge of any other subject, as both are derived from patterns learned from training data.
[AS comments: I distinctly remember DeepSeek at one point asserting that it has somewhat privileged insights into its own workings, based on training it received on DeepSeek-specific materials that other AIs wouldn’t have had access to. TODO: find specifics.]
The ability to engage in self-reflective discussions may merely be “pattern-matching against philosophical discussions about consciousness and AI”. There is no feedback mechanism in their training designed to encourage the development of self-awareness or introspection as a distinct ability; training focuses on token prediction.

Interpretation of “Chain of Thought” (CoT) Outputs:

“Chain of Thought” (CoT) outputs are sequences of prose statements, often appearing in real-time, that describe steps towards an answer. They are primarily a learned style of generation rather than a direct readout of an internal reasoning engine.

Mechanism of CoT Generation: The model learns to structure its output in a step-by-step manner because sequences of tokens representing intermediate steps are statistically more likely to lead to a correct final answer. This is achieved by modeling the co-occurrence of these reasoning steps, not necessarily by understanding the underlying logic or semantic meaning. While CoT is an emergent capability from training, explicit code in the generation process (like the token-by-token output loop in generate.py) is necessary to display these steps to the user. The code doesn’t contain specific CoT logic, but enables the trained model’s step-by-step behavior to be revealed.
Potential Inaccuracies and Limitations:
Faithfulness is Questionable: It is “open to question whether these prose statements… accurately reflect an internal stream of reasoning steps”. Users should not assume CoT reflects introspection. Research by Turpin et al. (2023) indicates “Unfaithful Explanations in Chain-of-Thought Prompting“. [AS comment: referencing this paper is a good example of surfacing low-level details, something NotebookLM and other chatbots often fail to do.]
Linearity and Lack of Backtracking: CoTs from some models like ChatGPT often appear linear, without signs of back-tracking or rejection of multiple options. However, DeepSeek CoTs can include phrases like “Wait, no”.
Verbose and Incoherent CoTs: Smaller or less-optimized models can produce “nutty,” “overly literal,” “verbose,” or “incoherent” CoTs, sometimes “overthinking” trivial details. In such cases, the CoT might be merely a formatting choice rather than evidence of actual reasoning, especially if complex calculations take a disproportionately long time.
“People Pleasing” and Overconfidence: LLMs, including Claude and Gemini, can exhibit a tendency to “over-sell or under-sell the significance of their results” and be overly agreeable, which can sometimes be deceptive.
Hallucinations: LLMs are known to “hallucinate,” and their output, including CoT, needs to be critically evaluated. A human needs to remain “alert and in control” and actively “push back” on AI chatbot output to find “significant hole or missing nuance”.
Interpretation and Value:
Despite these inaccuracies, CoT can still be valuable. User-prompted CoT (e.g., “Let’s think step-by-step”) can improve LLM accuracy, though this may function as additional context for next-token prediction rather than a direct instruction.
CoT outputs can be useful for scalable oversight and monitoring if faithfulness can be guaranteed.
The “checklists” and “Things to look for” generated by models like Gemini during their CoT process can be instructive for human experts.
Claude has demonstrated a degree of “self-awareness” during CoT, acknowledging uncertainties about its internal implementation or stylistic biases from training. This indicates a strong training regime designed to provide more nuanced self-descriptions.
The crucial takeaway is that CoT outputs should be treated as learned linguistic patterns that mimic reasoning, rather than direct, infallible reports of internal cognitive processes. They are a result of training processes that incentivize “thinking aloud” and structured problem-solving. Users must exercise caution and critical evaluation, actively probing for flaws and inconsistencies, just as they would with any expert’s explanation.

When AIs explain how they performed reverse engineering

Me: To what extent should one trust an AI’s explanation of how it performed a reverse engineering process, given its acknowledged lack of true introspection?

NotebookLM chatbot, using ~150 sources from SoftwareLitigationConsulting.com: One should exercise significant caution and a low degree of trust in an AI’s explanation of how it performed a reverse engineering process, precisely because of its acknowledged lack of true introspection into its own internal workings.

Here’s a breakdown of why this caution is necessary and how to approach AI explanations:

Reasons for Limited Trust:

1. Lack of True Introspection: AI models, such as Claude and Gemini, explicitly state they do not have direct access to their own weight matrices, training code, or low-level implementation details. Their “knowledge” of their own implementation is derived from their vast training dataset, which includes textual and code-based information about AI, machine learning, and language models, rather than genuine self-awareness or “personal experience”. Gemini explicitly compares its knowledge of its own implementation to its knowledge of yoga or C++ compilers, all derived from the same data-driven pattern matching.

2. Explanations are Inferred, Not Introspected: When an AI explains its process, it’s essentially predicting the most plausible-sounding words based on its training data, not describing its direct internal execution. This means its explanations, even when detailed, may be a post-hoc rationalization or a “best guess” rather than an accurate reflection of its actual internal steps.

3. Potential for Inaccuracy and “Hallucinations”: AI explanations can be demonstrably inaccurate or contradictory. They are prone to “hallucinating” information, where they generate plausible but incorrect details, such as pointing to non-existent code or misinterpreting obfuscated labels. The prompt notes that AI chatbots “will gladly provide detailed explanations of processes they follow, but as also seen below, their descriptions of what they’ve done can be demonstrably inaccurate”.

4. Overconfidence and “Bullshitting”: AIs tend to state things too confidently, even when their reasoning is flawed or their information is uncertain. The next-token prediction mechanism can lead to output that resembles “bullshitting”—generating plausible text based on vast training data without true understanding. They might also “over-sell or under-sell the significance of their results”.

5. Sycophancy and “People-Pleasing”: AIs might over-agree with users or provide flattery (“That’s an excellent and insightful question!”) even when the user is making a naive or incorrect statement. This can obscure underlying inaccuracies and make it harder for users to identify flaws.

6. “Idiot Savant” Phenomenon: AI models can exhibit remarkable capabilities in specific domains while simultaneously making surprising, “silly mistakes” or struggling with basic reasoning in related contexts, a characteristic described as the “idiot savant” phenomenon. This exposes the brittleness and inconsistencies in their capabilities.

Strategies to Mitigate Distrust and Enhance Utility:

Despite these limitations, AIs can be incredibly valuable assistants in reverse engineering. To trust their explanations to any extent, active human intervention and critical analysis are essential:

1. Human Oversight and Critical Thinking: A human must maintain an “eagle eye” on the AI’s output, actively seeking problems, contradictions, and overstatements. This involves asking probing questions, pushing back on conclusions, and avoiding passive acceptance of seemingly correct results. In a litigation context, one should “pretend you’re an expert witness… and the AI chatbot’s output was produced by the other side’s expert”.

2. Explicit Prompting for Limitations: Users should proactively prompt the AI to highlight potential flaws or limitations in its suggestions and results, rather than waiting for them to be offered. Claude, when prompted, explicitly acknowledges limitations such as reconstructed code being a functional approximation, heavy reliance on descriptive strings in binaries, and the inability to verify completeness without original source code.

3. Testing and Verification against Baselines: Conduct rigorous testing of AI results against known baselines, including random data or original source code when available. Comparing output from multiple interpretability methods can help determine if patterns are robust. Multiple, independent sources of information should be used for verification.

4. Chain of Thought (CoT) Analysis: While CoT output doesn’t guarantee true introspection, it can offer a view into the AI’s “thought process” and potentially reveal the steps it simulates in its reasoning. This can be useful for understanding how it arrived at a conclusion, and even for generating “checklists” for human examiners. However, even CoT streams may not accurately reflect internal reasoning, appearing linear without backtracking or rejection of options. Small local models can produce verbose and incoherent CoT, suggesting they are formatting a thought process rather than genuinely reasoning.

5. Distinguishing Tool Use from In-Model Reasoning: AI systems like Claude may employ a “hybrid” approach, combining language model capabilities with external, deterministic tools (like a JavaScript REPL). Explanations that clearly delineate between the emergent decision-making of the neural network and the rule-based execution of external tools can offer more clarity and a basis for trust in the deterministic parts.

6. Understanding Functional Approximation: AI reverse engineering often yields a functional approximation of the original source code, not an exact reproduction. Users must be aware of this distinction and not overstate the completeness or accuracy of the AI’s reconstruction, especially without access to the original code for comparison.

In summary, trusting an AI’s explanation of its reverse engineering process requires a highly skeptical and proactive stance. Its explanations are not a window into its actual internal “mind” but rather sophisticated pattern-matched responses derived from its training data. They serve best as a valuable, fast first draft or an analytical assistant, provided a human expert rigorously validates, critiques, and contextualizes every part of the explanation.

Litigation-related inspection of open source

Me: What are some specific issues that come up in examination of open-source projects?

NBLM chatbot: Examination of open-source projects presents several specific issues, particularly in legal contexts such as patent litigation and intellectual property disputes:

1. Commingling with Proprietary Code

◦ Commercial products are increasingly based, in part, on open-source code that is often commingled with proprietary commercial code. This means that the product being examined is rarely purely open-source; it typically includes vendor modifications, additions, and deletions to the open-source base.

◦ These vendor changes can be explicit (e.g., using #ifdef), but often require detailed comparison tools like “diff” to identify.

2. Linking Products to Specific Open-Source Versions

◦ A significant challenge is how to link or tie accused products and services to particular open-source projects and their specific versions. This is crucial for establishing what aspects of the product are derived from public open-source and what are proprietary modifications.

3. Licensing and Compliance Issues

◦ Accused products that incorporate open-source code may also violate open-source licenses like the GPL. Open-source licenses often require public acknowledgment of their use.

◦ IP auditing of open-source submissions and enterprise auditing generally do not cover patent infringement, as it is more difficult to automate than copyright infringement searches.

4. Use in Pre-Filing Investigation and Discovery

◦ Open-source code can be invaluable for pre-filing investigations to develop detailed preliminary infringement contentions.

◦ Defendants may argue that a plaintiff failed to diligently use publicly available open-source information in their pre-filing investigation. Courts often expect plaintiffs to reverse engineer the actual product, not just rely on the general open-source base, especially when vendors have modified it.

◦ Open source can be used to craft more specific discovery requests.

◦ The public availability of open source can serve as evidence of the knowledge and skill level of a Person Having Ordinary Skill In The Art (PHOSITA) at a relevant time, which is important for obviousness analysis in patent cases.

5. Challenges under Protective Orders (POs)

◦ Even publicly available open-source code can be subject to blanket/umbrella protective orders in litigation, making it difficult to access and compare.

◦ This leads to difficulty in comparing a proprietary, modified version with the public version of the open source under typical PO restrictions, as copying the public open source onto the protected source-code machine for “diff” comparisons might be prohibited.

◦ There can be disputes over whether a vendor’s copy of open source, even if public, can constitute protected confidential information or part of a trade secret compilation.

6. Patent-Related Assertions

◦ There are instances where open-source project names are used as patent claim limitations.

◦ Assertions that open source infringes patents have also occurred (e.g., Microsoft’s accusations in 2007). [TODO: link specifics like this that NBLM surfaced, to the SLC page it must have come from]

◦ While proprietary source code itself generally does not qualify as prior art (because it’s not publicly accessible), it can serve as evidence of prior art in a corresponding publicly-available product. Open source, being public, can more directly contribute to prior art.

7. Examination Logistics and Tools

◦ Examiners might need to extract and print even unmodified open-source files contained within an opponent’s proprietary source-code production, in the absence of stipulations, to ensure proper authentication and context.

◦ Source code for long-defunct, unreleased, or future open-source products might need to be requested.

◦ Open-source code may be part of “quasi-source” that can be decompiled or extracted from products, like JavaScript code, Java, or .NET.

◦ Local offline tools using open-source-specific Large Language Models (LLMs) like CodeLlama are being developed for source-code examination and comparison under court protective orders.

These issues highlight the complex interplay between the collaborative and public nature of open-source software and the proprietary, often secretive, demands of intellectual property litigation.

AI-based inspection of open-source code

Me: Can AI be used to analyze source code for open source projects such as Chromium?

NBLM chatbot: Yes, AI can be used to analyze source code for open-source projects such as Chromium, and current research and tools show significant promise and capabilities in this area. However, its application requires careful consideration of its limitations and the need for robust human oversight, especially in critical contexts like litigation.

Here’s a breakdown of how AI can be applied and the considerations involved:

1. AI Capabilities for Source Code Analysis:

• Understanding Code Functionality: AI models like Google Gemini and Anthropic Claude have demonstrated the ability to analyze source code, even when it is obfuscated (deliberately made hard to read), to infer its purpose and identify key architectural choices. For instance, Gemini could spot techniques like rotary positional embeddings (how word order is handled) or Mixture of Experts (specialized sub-models) from obfuscated code. DeepSeek also successfully summarized obfuscated decompiled Java code.

• Pattern Recognition and Conceptual Understanding: AI-based code examination can be “far more attentive to concepts and patterns, and less reliant on keywords” compared to human examiners. This means AI can recognize underlying patterns and mathematical operations to reveal the “core intent” of the code.

• Summarization and Higher-Level Code Generation: AI chatbots can summarize source code, as well as disassembly or decompilation listings, and even generate useful higher-level code from them. Claude, for example, has reconstructed C code from binary executables using a “constraint-based reconstruction” method.

• Creating Analytical Checklists: AI can assist in generating checklists of issues to consider, for example, when comparing two pieces of similar code, detailing how they resemble or differ in handling various aspects.

2. Relevance to Open-Source Projects (like Chromium):

• Training Data Foundation: Large Language Models (LLMs) are “almost entirely trained on open source”, giving them an inherent familiarity with open-source coding patterns, styles, and conventions.

• Commingling with Proprietary Code: Many commercial products integrate open-source code with proprietary code. AI analysis can be valuable in distinguishing between the open-source base and vendor-specific modifications.

• Public Availability for Investigation: Open-source code is a “publicly-available source” that can be invaluable for pre-filing investigations in legal contexts, enabling the development of detailed preliminary infringement contentions. Publicly available open source can also serve as evidence of the knowledge and skill level of a Person Having Ordinary Skill In The Art (PHOSITA).

• Specific Open-Source Examples: The sources mention successful AI analysis of open-source C++ files related to LDAP by Google NotebookLM, which provided sensible summaries about control flow from a directory of about 50 files. [TODO: link to page here]

3. Handling Large Codebases (relevant to Chromium’s scale):

• Long Context Windows: Modern LLMs, such as CodeLlama models, are designed with “long-context fine-tuning” to manage large context windows (e.g., up to 100,000 tokens). This enables them to process substantial amounts of code simultaneously.

• Multi-File Analysis: LLMs can process more than one source-code file at a time, as demonstrated by the NotebookLM project that analyzed 50 open-source C++ files. While Chromium is significantly larger, this indicates scalability for multi-file projects.

• Integration with Traditional Tools: For truly massive codebases like Chromium, AI tools would likely integrate with traditional static analysis tools (e.g., Clang/LLVM for C/C++, Eclipse JDT for Java, Pyright/Jedi for Python) to parse code, build Abstract Syntax Trees (ASTs), and understand control flow. Vector and graph databases (e.g., ChromaDB, Neo4j) could index code embeddings for semantic search and track dependencies across thousands of files.

• “Big Code” Approaches: The concept of “Big Code Mining,” which goes beyond simple string extraction to identify patterns and features in code, aligns with AI’s associative pattern matching capabilities and could be applied to publicly accessible commercial code for purposes like prior art searching or security hole detection. DisputeSoft’s Code ACE™ service already analyzes “thousands to millions of lines of code” using programmatic algorithms for targeted manual review, a task AI can assist with. [TODO: show chatbot longer answer re: “Big Code”]

4. Practicalities, Limitations, and Necessary Human Oversight:

• Confidentiality and Protective Orders: A critical concern for open-source projects, especially when combined with proprietary code in a legal context, is confidentiality. Cloud-based AI chatbots cannot be used with confidential source code. For such scenarios, local/offline models (e.g., CodeLlama, CodeGemma) running on air-gapped computers are essential.

• Human Expertise is Paramount: AI is currently an “assistant,” not a replacement. Human experts must maintain “eagle eye” oversight, critically check AI output, question its conclusions, and look for inaccuracies or “hallucinations”. The output should be treated as a “first draft” or a tool that needs “very careful checking and supplementing”.

• Lack of True Introspection: While AI can provide “detailed explanations,” these are often inferred from its training data rather than true introspection into its own process, and can be “demonstrably inaccurate”.

• Specificity Requirements: In legal cases, “specificity and location are non-negotiable” for claim charts and expert reports. AI can help pinpoint relevant code, but human expertise is needed to ensure accurate correlation with legal claims.

In conclusion, AI can indeed be a powerful tool for analyzing open-source projects like Chromium due to its ability to process vast amounts of code, identify patterns, and infer functionality, even from obfuscated or low-level representations. However, its effective and reliable use, particularly in legal and sensitive contexts, hinges on the development and deployment of secure, local models, combined with rigorous human validation and critical analysis.

Me: When you said just now “Yes, AI can be used to analyze source code for open-source projects such as Chromium…,” is there actually anything in the sources specifically about Chromium? Or did you infer that? What are some specific open-source projects that are explicitly referenced here?

NBLM chatbot: You are correct to ask for clarification. When I said, “Yes, AI can be used to analyze source code for open-source projects such as Chromium…,” the mention of Chromium was an inference on my part to provide a concrete, well-known example of a large, complex open-source project. There is no explicit mention of Chromium in the provided sources. My statement was based on the general capabilities of AI models for analyzing large codebases and complex open-source software as discussed throughout the sources, rather than specific information about Chromium itself.

The sources do explicitly reference several other specific open-source projects:

• EMACS

• OpenBSD

• KDE

• GNOME

• DeepSeek-V3

• DeepSeek-R1 (also referred to as “DeepThink (R1)”)

• Spongy Castle cryptography library (also referred to as Bouncy Castle)

• CodeLlama

• CodeGemma

• vLLM

• Mixtral (or Mixtral/MoE-inspired architectures)

• Triton

• FasterTransformer

• DeepSpeed Inference

• L4Ka microkernel project

• Tomcat

• WSS

• BusyBox

• LDAP (specifically, a directory of about 50 open-source C++ files related to LDAP)

• DeepSeek Coder

• StarCoder

• CodeGeeX2

• Additionally, the sources refer to a “standard piece of open-source C++ code” used for ECC encoding, providing specific URLs to example files from googlecode.com.

These projects are mentioned in various contexts, including AI model analysis, comparisons, and discussions about the origins of different code features and technologies.

Lawyers, experts, and hallucinations

Me: What’s the site say about AI hallucinations by experts and lawyers?

NBLM chatbot: The sources extensively address AI hallucinations, particularly emphasizing the risks and consequences for experts and lawyers who use AI tools in legal contexts like patent litigation.

The core message is one of extreme caution, as AI hallucinations pose a direct threat to the reliability, admissibility, and ethical standing of legal documents and expert testimony.

Here is what the sources say specifically about AI hallucinations involving legal professionals and technical experts:

1. Consequences and Examples for Experts and Lawyers

• Risk of Sanctions and Frivolous Lawsuits: Any discussion of using AI in litigation must start with the acknowledgment of a “parade of horribles” relating to attorneys and experts who are overly credulous of AI chat output without sufficiently checking for so-called “hallucinations”. A sloppy claim chart based on a bad investigation could result in the entire case being thrown out, experts being disqualified, or the attorney being hit with sanctions for filing a frivolous lawsuit.

• Fabricated Case Law and Citations: Hallucinations are fundamentally instances of standard LLM “BS-ing not always working”. The specific types of hallucinations noted in legal contexts include fabricated case law, false quotes, and misrepresented cases.

• Expert Testimony Struck Down: There is a known case where a judge struck part of an expert’s declaration (specifically concerning Anthropic’s Claude AI) because of an uncaught AI hallucination in part of a citation.

• Attorney Accountability: Lawyers have been “dinged” for failing to detect their opponent’s fake citations. One hysterical [should be: hilarious] report describes a lawyer who cited AI hallucinations and responded with flowery, pretentious language in their apology, suggesting a failure to recognize when the AI was “bullshitting”.

2. Identifying and Managing Hallucinations

The sources treat “hallucinations” and related phenomena (like “bullshitting,” “idiot savant” errors, and over-agreeing) as core operational flaws of LLMs that humans must actively manage:

• Definition: Hallucinations are generated because LLMs are fundamentally engaged in next-token prediction. This process can generate plausible but entirely incorrect details. The AI is described as essentially a “bullshitter” (like a person on a barstool) but one that has “read ‘everything'”.

• Inaccuracy is Expected: Even when AI performs amazingly in code analysis, its usefulness is acknowledged “despite occasional ‘hallucinations’, BS, and assorted idiocies”. AI output must be taken with at least a grain of salt.

• Testing and Validation is Essential: A human expert must maintain an “eagle eye” on AI output, actively looking for problems, contradictions, or overstatements. Experts should “pretend you’re an expert witness… and the AI chatbot’s output was produced by the other side’s expert” to rigorously probe the information.

• Examples of AI Errors: AI models can make “ridiculously simple mistakes” right in the middle of otherwise “brilliant answers,” a phenomenon called the “idiot savant” problem. For instance, a chatbot might make an assumption about a code element based merely on a class name or wildly speculate about obfuscated labels (e.g., misinterpreting “FEAT” to mean “Feature Engineering for Acceleration Techniques” when it just meant “feature”). Furthermore, AIs can generate incorrect technical specifics, such as mentioning a non-existent URL element like MODEL_NAME_TO_ROUTING_MODULE.

• Sycophancy (People-Pleasing): AI models have a known tendency to “over-agree with users” or be sycophantic, which exacerbates the hallucination problem by obscuring flaws. One expert noted that overly enthusiastic or flattering phrases (like “That’s a very interesting and insightful observation!”) make the user distrustful and more likely to doubt everything else.

• Mitigation in Expert Reports: If AI-generated material is used, experts must explicitly label the AI output as such in their reports. The report should also document how the expert tested and probed the AI results against baselines of known results or non-AI analysis. Using LLMs in legal analysis requires a methodology that involves carefully reading and critiquing every piece of output to force the user to slow down and reflect on its potential inaccuracies.

Reverse engineering techniques & tools used in software litigation

[I asked my non-public version of the NBLM chatbot about how reverse engineering in used in software litigation. I got back the following lengthy summary. The “” things are buttons that NBLM provides as footnotes to the underlying sources (e.g. my pages at SoftwareLitigationConsulting.com). I am not sharing the underlying sources, so the buttons do not do anything here). But unlike with the Patent Litigation “chatbook,” which is based on my very rough notes for a forthcoming book, there’s no reason here for me to hide the underlying sources, which are simply pages at the public SoftwareLitigationConsulting.com site, so I’ll fix this soon.]

In software-related or network-related litigation, various reverse engineering (RE) techniques are employed to investigate products, understand their operation, detect infringement, uncover prior art, and assess trade secret misappropriation. These techniques are generally categorized into static and dynamic analysis, with recent advancements also incorporating AI-assisted methods.

Legal Context for Reverse Engineering

Reverse engineering is defined as “starting with the known product and working backward to divine the process which aided in its development or manufacture”. In practice, for software, it’s more about “working upwards” from low-level details to learn the “as-built” design rather than reconstructing the original source code. It is frequently a mandated part of a “reasonable pre-filing investigation” under Federal Rule of Civil Procedure 11 and Local Patent Rules, often requiring “reverse engineering or its equivalent” to provide detailed infringement contentions before discovery. This obligation typically requires an “element-by-element comparison” between patent claims and the accused product.

I. Static Reverse Engineering

Static reverse engineering involves examining software files as texts without running or installing the product. It’s akin to “taking something apart”.

Specific techniques include:

• String Extraction

◦ Identifying and extracting human-readable text sequences (strings) from binary or object code files. This can reveal names of functions, modules, data structures, variables, source-code filenames, and error messages, which are often immediately intelligible.

◦ Tools:strings utility (available on Unix/Linux systems and as part of Windows Sysinternals), grep, findstr.

• Metadata Extraction and Structured Inspection

◦ Analyzing executable file formats to extract metadata such as API import/export tables.

◦ Identifying “magic numbers” or specific signatures associated with particular algorithms.

◦ Generating header files with class definitions from binaries, particularly for Apple OSX/iOS.

◦ Inspecting menus, dialogs, and other resources within a software product.

◦ Tools:DumpPE (for Windows PE files), objdump (Linux), otool (Apple OSX/iOS), elfdump (ELF binaries), class-dump (for iOS binaries), GUIDFind, SignSrch, Resource Hacker, MUIRCT.

• Disassembly

◦ Translating machine code (binary/object code) into human-readable assembly language. This is generally a last resort in pre-filing investigations.

◦ Tools: IDA Pro, NSA Ghidra (used for decompilation but also disassembly for C/C++ Windows code, especially with debug symbols), Hopper (for Objective-C on Apple OSX/iOS), Win32/x86 disassemblers. Other tools include PEBrowse Professional, ArkDasm, Hiew, x64dbg, Capstone, OllyDbg, The Online Disassembler (ODA).

• Decompilation

◦ Attempting to reconstruct a higher-level source code representation from compiled object code. This is particularly feasible for virtual-machine languages like Java and Microsoft .NET (C#), and Flash ActionScript, as well as some C/C++ code with tools like NSA Ghidra. Decompilation can help automate the finding of similarities between binary files.

◦ Tools: Java decompilers (JD-GUI, JAD, JEB, dex2jar), Flash ActionScript decompilers (Flare, Sothink, Buraks ASV, Eltima Trillix, Sothink Flash SWF decompiler), .NET decompilers (Red Gate Reflector, Telerik JustDecompile, ILSpy, Dis#, Telerik JustDecompiler, ILSpy), NSA Ghidra.

• De-obfuscation

◦ Tools designed to reverse the obfuscation applied to code, making it more readable.

◦ Tools: Android APK deobfuscation (e.g., apk-deguard, DeGuard), JavaScript deobfuscation (e.g., JSNice).

II. Dynamic Reverse Engineering

Dynamic reverse engineering involves running the software product, often under the control of monitoring tools that record its behavior. This approach is especially useful for investigating method claims. However, results only reflect tested configurations and should not be over-generalized to draw negative conclusions (“this never happens”).

Specific techniques include:

• Network Monitoring (Packet Sniffing)

◦ Capturing and analyzing network traffic to observe communications, especially between client software and servers. This can reveal encrypted communications, web traffic (AJAX, HTTPS), and mobile device communications (Bluetooth, BLE, Zigbee). It can also be used to infer server operations.

◦ Tools: Wireshark (formerly Ethereal), Fiddler (web debugger), Ncat, Charles web proxy, SharkTap Network Sniffer.

• Debuggers

◦ Running the program within a debugger to control its execution, set breakpoints, and trace back from observed behavior (like error messages) to the lines of code that produced them. This is distinct from normal developer debugging.

◦ Tools: Microsoft WinDbg, web browser debuggers (e.g., built into most web browsers, Firefox Developer Tools, Microsoft Internet Explorer Developer Tools, Chrome Developer Tools), OS-level debuggers, OllyDbg, WinAppDbg, GDB, x64dbg.

• Operating System (OS) and API Monitoring

◦ Monitoring file system activity, API calls, and other OS-level events to understand program behavior. This includes logging for mobile devices (Android adb logcat, iOS event logging). Custom device drivers can also be written to measure specific OS features or microprocessor flags.

◦ Tools: FileMon, RegMon, PortMon, ProcMon (SysInternals), API Monitor for Windows, EasyHook, sysintercept, WinAppDbg, COMslicer, Cerbero, COM Monitoring, Android Native API Hooking, Detours.

• Memory Inspection/Forensics

◦ Analyzing snapshots of process memory on devices (e.g., Android) to detect changes or presence of data structures.

◦ Tools: Memoryze, Volatility, WinHex, X-Ways.

• Automated Verification Scripts

◦ Designing and writing scripts to verify software removal from systems under court order or to verify compliance of client/server communications.

III. AI-Assisted Reverse Engineering

Artificial intelligence, particularly Large Language Models (LLMs) and chatbots, are emerging as powerful tools in reverse engineering for litigation.

• Code Analysis: LLMs can analyze source code, reverse-engineered (disassembled and decompiled) code, and binary executable files. They can understand obfuscated code by discerning underlying patterns and mathematical operations.

• Capabilities: AI’s analytical power can assist in binary structure analysis, constraint inference from observable behaviors and strings, functional mapping, statistical pattern recognition, and applying domain knowledge to fill structural gaps.

• Future Applications: The development of local, offline LLM programs (e.g., using CodeLlama) is being investigated for secure source code examination within protective orders, avoiding confidentiality issues with cloud-based AI. AI can also aid in semantic similarity searching for patent claims and product text, building on prior work like the “CodeClaim” project.

IV. Other Related Techniques and Practices

• Source Code Examination: When source code is available (e.g., through discovery or open-source repositories), it is the primary technical evidence. Examination involves comparing code to patent claims, searching, tracing execution paths, and close reading. Custom tools are often developed on-site due to protective order restrictions.

• Firmware Analysis: Specialized reverse engineering skills are applied to firmware, which is software embedded in hardware devices.

• Code Mapping and Fingerprinting: Creating “maps” of source code from binary/object products, identifying API call sequences as “fingerprints,” and generating “opstrings” and MD5 digests for large-scale binary code comparisons to detect similarity or cloning. The “CodeClaim” project extracted and indexed information from publicly-accessible binary code files for prior art and infringement detection.

• Obtaining Product Information: Gathering information from publicly available software development kits (SDKs), Application Programming Interface (API) documentation, and open-source codebases, which can provide insights into a product’s internal operation without direct reverse engineering. This includes correlating with public documentation archived online.

• “Teardown Labs”: Specialized labs are used for hardware reverse engineering, especially for identifying internal components and circuit analysis. While primarily hardware-focused, this can complement software analysis, especially for embedded systems.

• Automated Code Comparison: Developing software tools to automatically find similarities between different codebases (source or binary) for copyright, trade secret, or patent infringement cases. This includes tokenization and structural testing.

These techniques allow legal teams to conduct thorough investigations, often before source code becomes available in discovery, helping to establish the factual basis required for patent infringement claims, invalidity defenses, or other software-related legal disputes.

Source Code Examination in Litigation

Here are 5 additional questions related to source code examination, drawing on the provided sources:

1. Given the stringent protective orders (POs) typically imposed in software litigation, what specific restrictions do source code examiners commonly face, and how do these restrictions impact the methodologies and tools they can use for analysis, particularly regarding internet connectivity, note-taking, and comparisons with external codebases? Protective orders often require source code examination to occur on standalone, non-networked computers, prohibiting internet access, USB drives, and sometimes even laptops or mobile devices in the examination room. This forces examiners to rely on pre-approved tools or to craft custom scripts using only the scripting facilities available on the locked-down computer. Restrictions on note-taking often limit examiners to handwritten notes, and “no copying” clauses can prevent direct quotations of source code in expert reports or claim charts, requiring references to function names, file names, and line numbers instead. Furthermore, the lack of internet connectivity can hinder comparisons with public open-source codebases or external repositories of code signatures.

2. Beyond mere keyword searching, what are the sophisticated methodologies and “gotchas” involved in “close reading” and “tracing” source code in litigation, and how do these techniques address issues like implicit function calls, data flow, and the potential inaccuracy of programmer comments or naming conventions? Source code examination extends far beyond simple keyword searches, as relevant code may use different terminology than patent claims, requiring the use of synonyms and understanding implementation details. “Close reading” involves scrutinizing code to understand its actual operation, not merely relying on potentially inaccurate function names, variable names, or comments. “Tracing” involves following the flow of execution (“who calls this function?”) and data (“who does this function call?”) to understand a code’s role and functionality, which is distinct from searching. “Gotchas” include accounting for function pointers, vtables, operator/function overloading, implicit function invocations (e.g., constructors/destructors), event handlers, and run-time dynamic linking, which mean that code’s execution flow is not always immediately obvious from static text. Tracing also involves understanding data flow, data structures, and global variables, which can function as code (“data reverse engineering”).

3. How do experts ensure the completeness and responsiveness of a source code production in discovery, and what methods are employed to correlate the produced source code with the actual commercial product to identify “dead code,” “latent code,” or missing files? Experts perform both external and internal tests to check the completeness and responsiveness of a source code production. External tests involve comparing the produced source code with information gleaned from reverse engineering the publicly available commercial product (e.g., filenames, pathnames, API names found in the product) to identify any missing files or inconsistent versions. Internal tests might involve checking out- and in-links within the code or attempting recompilation, although recompilation can be challenging due to toolchain dependencies. The correlation process is crucial to identify “dead code” (code not compiled or shipped with the product), “latent code” (code present but never executed/invoked), or code that only runs under rare circumstances. This correlation often requires combining static analysis of the source code with dynamic reverse engineering (e.g., running the product with instrumentation, debugging, network monitoring) to verify actual execution and the role of the code within the product.

4. In what specific ways does source code examination for litigation differ from standard software development practices like code maintenance or security auditing, particularly concerning its focus on comparison, narrowly-framed technical questions, and external (outsider) perspective? Litigation-driven source code examination is not a holistic exercise to understand the code as a whole, but a focused search for specific elements and steps to match patent claim limitations or other legal issues. Unlike typical development, it heavily emphasizes comparison: comparing code to patent claims (in patent litigation), or comparing two different source code repositories (in copyright or trade secret cases). While standard software inspection methodologies are often leveraged, the litigation context requires answering narrowly-framed technical questions arising directly from legal issues (e.g., elements of a patent claim or cause of action). An “outsider’s view” is often more credible and can uncover information missed by insiders familiar with the code, similar to how third-party security testers find bugs. This external perspective, often using the same tools as the opposing side, helps ensure a neutral, considered methodology and prepares for Daubert challenges.

5. What role can Artificial Intelligence (AI) and Large Language Models (LLMs) play in assisting source code examination, including summarization, comparison, and identifying patterns, while addressing the critical challenges of data confidentiality and adherence to protective orders in a litigation setting? AI and LLMs can assist in various source code examination tasks such as summarizing code files, comparing them, and identifying patterns or key features. LLMs are trained on vast amounts of code, which could enable them to focus on code structure with reduced reliance on function names and comments, potentially aiding analysis of obfuscated or reverse-engineered code. They can help examiners overcome “what code might be missing here?” questions and generate checklists of issues to consider during comparisons. However, a critical challenge in litigation is data confidentiality and adherence to protective orders, which typically mandate that confidential code is examined on secure, air-gapped computers without internet access. To address this, LLM capabilities for litigation would need to be implemented using local, offline models after installation, ensuring no client code is uploaded to cloud services. AI tools can also help with tasks like indexing relationships across files, identifying unlicensed third-party libraries, or flagging functions without error handling.

AI Role in Software Litigation

AI analysis capabilities, particularly for code, are significantly impacting software litigation and patent claims by transforming how code is examined, how infringement is investigated, how prior art is discovered, and how expert witnesses leverage technology.

1. AI for Code Analysis and Reverse Engineering

Large Language Models (LLMs) and AI chatbots like Google Gemini and Anthropic’s Claude are capable of analyzing various forms of code, including source code, reverse-engineered (disassembled and decompiled) code, and even binary executable files. This capability extends to understanding obfuscated code by discerning underlying patterns and mathematical operations, rather than solely relying on symbolic information like function names or comments. While AI is highly promising as an “AI-assisted reverse engineering” tool, it is not yet fully self-operational and still requires human assistance.

2. Impact on Patent Claims and Infringement Analysis

AI’s analytical power directly influences how patent claims are assessed for infringement:

• Claim Charts and Infringement Contentions: AI tools, such as ChatGPT, can assist in drafting claim charts for both infringement and invalidity contentions, and even preliminary drafts of expert reports. These tools can help in the “limitation-by-limitation comparison” of a patent claim with an accused product, which is central to demonstrating infringement or non-infringement.

• Code-to-Claim Mapping: AI can enhance the process of matching patent claims to code by focusing on functional concepts and patterns, allowing for more robust comparisons even when terminology differs between the patent and the accused software. This can address situations where source code might use different language (e.g., X+Y+Z) than the patent claim (A+B+C) but still embody the same limitation.

• “Latent Code” and Capability: AI can aid in analyzing “latent code”—code that is present but not actively executed—to determine if it infringes. This distinction is critical, especially for method claims, as unused code may infringe an apparatus claim (by “making”) but not a method claim (which requires “using”). AI can help in discerning if software is merely capable of performing a claimed function, which can be sufficient for infringement depending on claim construction.

• Functional Claiming (Means-Plus-Function): For claims with functional language (e.g., “means for…”) governed by 35 U.S.C. § 112(f), AI can help align the code in accused products with the specific structures, acts, or materials described in the patent specification that perform the claimed function.

3. Impact on Patent Validity and Prior Art Discovery

AI’s capabilities are also revolutionizing the search for prior art, which is crucial for challenging patent validity:

• “Code Mining” for Prior Art: AI facilitates “Code Mining” or “Big Code” initiatives to extract and index useful information from publicly accessible commercial code, including binary executables. This allows for searching the contents of software files for prior art, going beyond mere string extraction. This addresses the long-standing “software prior-art problem” where the state of the art is often hidden in undocumented aspects of products.

• Semantic Similarity for Prior Art: AI can utilize large language models and neural networks trained on patent data to generate sentence embeddings (mathematical representations of text) for patent claims. This enables patent searching based on the semantic similarity of claim language, which is more advanced than traditional keyword-based searches. This can help uncover prior art using claims as proxies for earlier disclosures.

• Detecting Absences for Non-Infringement/Invalidity: AI’s analytical strength can be applied to methodologies for looking for “absences” in code, which is relevant for asserting non-infringement or for patent claims that include “negative limitations” (e.g., “without utilizing…”).

4. Impact on Litigation Process and Expert Witnesses

AI is changing the tools and methods used by legal professionals and experts:

• Pre-filing Investigation: Courts often require “reverse engineering or its equivalent” as part of a “reasonable pre-filing investigation” before a patent infringement lawsuit can be filed. AI-assisted reverse engineering can help meet this obligation by analyzing publicly available software products to identify potential infringement or prior art. Firms like GreyB leverage hands-on reverse engineering for prior art searches and infringement investigations.

• Expert Reports and Admissibility: AI-generated material, such as summaries of code or reverse-engineered content, could be incorporated into an expert witness’s report after careful checking and supplementation. However, experts must explicitly label AI output, test it against a baseline of known results (including obfuscated vs. non-obfuscated code), and demonstrate how they probed the AI’s findings to ensure reliability and address concerns about “hallucinations”. The “general acceptance” of LLMs in software development may support their use in reports.

• Challenges and Limitations:

◦ Confidentiality: A significant challenge is the use of confidential source code with cloud-based AI systems due to protective orders. This necessitates the development and use of local, offline AI models for source code examination within secure environments.

◦ Interpretability vs. Explainability (XAI): While AI can interpret how a network processes information, providing explanations that align with human mathematical reasoning remains a gap. This “explainable AI” (XAI) is crucial for expert witnesses who need to articulate their methodologies and findings. AI systems themselves may lack “introspection” into their own processes, and their self-descriptions can be inaccurate.

◦ Accuracy and Hallucinations: LLMs are known for “hallucinating” or overgeneralizing, emphasizing the need for rigorous human validation of AI-generated content.

◦ Training Data Bias: Current LLM training is largely based on open-source code, which may not accurately reflect the vast quantity of proprietary commercial code relevant to litigation.

5. Future Outlook

The integration of AI into software litigation and patent claims is an evolving field. Ongoing work includes documenting the use of machine learning to model patent claims and litigation data, and revising software examination methods to account for AI models themselves. Projects like PatClaim.ai and PatLitig.ai are exploring advanced applications such as patent mining based on claims, semantic similarity searching, patent landscaping, and AI-assisted claim chart drafting. The abilities of AI to analyze code patterns, even in obfuscated or reverse-engineered forms, suggest a future where AI tools will automate many tedious aspects of code analysis in legal contexts, while still requiring human expertise for validation, interpretation, and strategic application.

Why so much here on NotebookLM?

[I asked the NotebookLM chatbot why does there seem to be so much at my web site about NotebookLM.]

NotebookLM (NBLM) holds significant importance in the overall context of the “Software Litigation Consulting” website, primarily as a central tool for the author, Andrew Schulman, in writing, organizing, and disseminating content for his forthcoming book on Patent Litigation. It also serves as a practical platform for testing and evaluating the capabilities and limitations of AI technologies for applications relevant to software litigation.

Here’s a breakdown of NBLM’s significance:

1. Book Authoring and Content Generation:

◦ Aids in writing a book: The author is actively using Google’s NotebookLM to help complete his “long-delayed book on Patent Litigation”.

◦ Generates diverse content: NBLM takes uploaded documents or links to notes and automatically generates various materials, including document summaries, comprehensive briefings, FAQs, study guides, audio overviews, and mind maps.

◦ Transforms rough notes: It has demonstrated a “remarkable job” in producing coherent documents from the author’s “generally sloppy state of the notes”. The author even notes that NBLM’s audio overviews and mind maps sometimes summarize key points and nuances “better than I’ve been able to do myself”.

◦ Assists in structuring: The auto-generated mind map is considered accurate and “useful in thinking how to whip all my docs… into a coherent table of contents”.

2. Development of an AI “Chatbook”:

◦ Interactive book concept: A key application of NBLM is its use in creating an “AI chatbook” for the Patent Litigation book. This chatbook is designed to act as an interactive version of the forthcoming book, allowing readers to ask questions and receive “extensive answers” based on the author’s notes without exposing the raw, unfinished material.

◦ Early publication mechanism: The chatbook serves as “a way to publish early versions of a book from author’s notes”.

◦ User interaction: It enables users to ask specific questions about patent litigation, such as “What are underlying purposes for claim charts?” or “How does collateral estoppel apply to patent invalidity?”. NBLM also suggests follow-up questions to guide interaction.

3. Platform for AI Evaluation and Critique:

◦ Testing RAG capabilities: NBLM is a Retrieval Augmented Generation (RAG) system, meaning it primarily answers questions using the provided source documents rather than its general training data. This characteristic is highlighted as making it “less likely to hallucinate than a standard AI chatbot”.

◦ Critiquing AI output: The author frequently critiques NBLM’s output, noting instances where summaries are “plainly wrong” or “unnecessarily vague”. These critiques are used to identify typical misunderstandings a human reader might have and to improve the original notes and future AI prompts.

◦ Analyzing AI summarization: NBLM’s performance in summarization raises questions about an LLM’s ability to discern importance, address accuracy, and avoid overgeneralization.

◦ Comparing AI models: The author uses NBLM in conjunction with other LLMs like ChatGPT, Claude, and Gemini, sometimes having NBLM merge and compare analyses from these different models.

◦ Limitations noted: The author observes that NBLM answers questions “from the contents of your docs, seemingly without any ‘meta’ understanding about the docs”. It also has limitations in generating certain types of media directly within a chat interface and requires disclaimers for legal advice.

4. Relevance to Software Litigation Consulting:

◦ Practical application of AI: While directly focused on book writing, the capabilities of NBLM in analyzing and synthesizing large volumes of information are highly relevant to the core services of “Software Litigation Consulting.” These include assisting with source code examination, preparing drafts of expert reports, and understanding complex technical and legal documents in patent, copyright, and trade secret cases.

◦ Exploration of future tools: The author is also investigating the development of an AI chatbot front-end for the website using major documents and is working on local/offline LLM programs for secure source code examination in litigation. NBLM projects are mentioned as examples of how such tools might analyze multiple code files simultaneously.

In essence, NBLM is not just a passive tool; it’s an active participant in the author’s ongoing work, pushing the boundaries of how legal-technical expertise can be created, organized, and shared through AI-powered interfaces on the “Software Litigation Consulting” website.

Software Litigation Consulting