Kathleen Moriarty

The LLM Misinformation Problem I Was Not Expecting

Kathleen Moriarty

5 min read

3

The prolific use of Artificial Intelligence Large Language Models (LLMs) present new challenges we must address and new questions we must answer. For instance, what do we do when AI is wrong?


I teach two Masters-level courses at Georgetown, and as such, I've received guidance on how the program allows use of tools like Chat GPT and Bard. I expected to see students use AI and LLMs without properly validating generated content or providing attribution to the content sources.

In one instance, students submitted oddly similar submissions that may have started in part or in full from AI LLMs. In that particular case, however, they sought supporting materials in a manner similar to the use of an Internet search engine. Then the fall 2023 semester began, and a new pattern emerged.

A trend of non-vetted content

Not long into the fall 2023 semester, students began to cite blogs and vendor materials that made sense but were partly or entirely incorrect. This problem traces back to LLMs providing "hallucinations." In some cases, vendor content creators incorporate these untrue materials directly into their published content without vetting or correcting them.

It wasn't an infrequent problem during the fall 2023 semester. In the past four years of teaching three semesters a year, I encountered just one activity where several students found incorrect information as the result of a high search result. During the fall 2023 semester, however, I noticed the problem on at least three separate assignments. In one case, the information was put together so well in the source materials that it caught me off guard. I had to validate my own thoughts with others to confirm!

Let's take a look at a couple of examples to better understand what's going on.

Misidentifying AI libraries/software as operating systems

In one example, I saw students reference descriptions of what might be AI-related libraries or software as operating systems. In a recent module on operating systems, for instance, students enthusiastically described "artificial intelligence operating systems (AI OS)" and even "Blockchain OS." There's just one issue: there's no such thing as an AI OS or Blockchain OS.

This content made it online because no one corrected it before publishing it in multiple places online as blog content. Inaccurate descriptions, such as those calling AI libraries or software development kits as operating systems, add confusion when students and even professionals use internet resources to learn about new developments and technologies.

In this case, students needed to learn about the evolution of operating system architecture. Vetted materials were available, but some students veered into their own research and wound up using sources with content that was not accurate. To its credit, the content was very descriptive and convincing – although incorrect.

The issue here is more than just semantics or a nuance. This type of content makes it more difficult for students to grasp the purpose of an operating system versus libraries, software development kits, and applications – concepts that are fundamental to system architecture and its security.

False authentication protocols

Another example of non-vetted AI results includes how some online content inaccurately describes authentication, creating misinformation that continues to confuse students. For instance, some AI LLM results describe Lightweight Directory Access Protocol (LDAP) as an authentication type. While it does support password authentication and serve up public key certificates to aid in PKI authentication, LDAP is a directory service. It is not an authentication protocol.

Vetting in education and InfoSec

This problem I've discussed above is likely happening in more fields than security architecture and design. When it comes to validating content in any field, two themes come up consistently:

Author credibility

  • Is the author recognised for the work, topic cited, or somewhat related work?
  • Is there evidence that other experts have validated the content?
  • When was the material published? Have the authors applied any updates or corrections?

Source credibility

  • Do sources support the conclusions?
  • Are the sources ones you would consider to be trustworthy or known to be vetted?
  • If standards are referenced, do the materials provided by the standards committee support the language and claims? Are the technical terms consistent from the standards committee?

As a way forward, the Center for Internet Security (CIS) will be adding a marker to blogs to communicate the level of review prior to publication. For my own blogs, I’ve reached out to known experts to review them. (In one case, I've decided to hold one up from publication due to an oversight that requires correction.) This is more of an allow-list approach toward understanding what content has been vetted rather than expecting AI results to be marked.

As for fellow teachers, you can and should provide guidance on sources that are known to be reliable within a field of study. This is something I did with my students after detecting the problem. Students should check that sources have vetted their content and that the content creator has the credentials to verify their published content.

In creation of a new best practice

The problems around vetting AI results won't be going away anytime soon. It’s important that educators make sure students have the proper guidance to guide their research in a field of study. Education should embrace markers similar to those proposed by CIS. These tools can go through a consensus process to gain acceptance as a new best practice, which could ultimately prove useful for updating and sharing content expediently. 

3

You may also like

View more

About the author

Kathleen Moriarty, technology strategist and board advisor, helping companies lead through disruption. Adjunct Professor at Georgetown SCS, also offering two corporate courses on Security Architecture and Architecture for the SMB Market. Formerly as the Chief Technology Officer, Center for Internet Security Kathleen defined and led the technology strategy, integrating emerging technologies. Prior to CIS, Kathleen held a range of positions over 13 years at Dell Technologies, including the Security Innovations Principal in Dell Technologies Office of the CTO and Global Lead Security Architect for EMC Office of the CTO working on ecosystems, standards, risk management and strategy. In her early days with RSA/EMC, she led consulting engagements interfacing with hundreds of organisations on security and risk management, gaining valuable insights, managing risk to business needs. During her tenure in the Dell EMC Office of the CTO, Kathleen had the honor of being appointed and serving two terms as the Internet Engineering Task Force (IETF) Security Area Director and as a member of the Internet Engineering Steering Group from March 2014-2018. Named in CyberSecurity Ventures, Top 100 Women Fighting Cybercrime. She is a 2020 Tropaia Award Winner, Outstanding Faculty, Georgetown SCS. Keynote speaker, podcast guest, frequent blogger bridging a translation gap for technical content, conference committee member, and quoted on publications such as CNBC and Wired. Kathleen achieved over twenty five years of experience driving positive outcomes across Information Technology Leadership, short and long-term IT Strategy and Vision, Information Security, Risk Management, Incident Handling, Project Management, Large Teams, Process Improvement, and Operations Management in multiple roles with MIT Lincoln Laboratory, Hudson Williams, FactSet Research Systems, and PSINet. Kathleen holds a Master of Science Degree in Computer Science from Rensselaer Polytechnic Institute, as well as, a Bachelor of Science Degree in Mathematics from Siena College. Published Work: - Transforming Information Security: Optimizing Five Concurrent Trends to Reduce Resource Drain, July 2020.

Comments 3