Our recent post “Tracking AI Prosecution Trends at the U.S.
Patent Office” presented USPTO data which suggests that future
prosecution of AI inventions may be less focused on patent
eligibility under 35 U.S.C. §101 and more focused on the
traditional requirements of §§ 102, 103 and 112. This
post is the first of a two part series looking into the challenges
that AI inventions present to one of these traditional
requirements: patent disclosure under 35 U.S.C.
§112(a). In this Part I, we identify the unique
disclosure issues with AI inventions. In Part II, we provide
practice tips for describing and enabling AI inventions.
A fundamental premise of most patent systems is
the quid pro quo by which an inventor discloses
his or her invention to the public in return for exclusive rights
to use such invention for a limited time. Recent advances in
artificial intelligence (AI) have sparked debate as to whether
current patent disclosure requirements can enrich the public with
AI inventions such that the granting of the exclusive right is
justified. This debate inevitably centers on the “black
box” nature of a particular type of AI: machine
learning. Machine learning is the dominant AI technique
disclosed in patents.1 As such, understanding the
patent disclosure issues presented by AI inventions requires an
understanding of the basics of machine learning.
Basics of Machine Learning
It is well known that machine learning requires first
“training” a system using a training data set, and then
deploying the trained system to “infer” predictions from
new data not previously seen by the system. The training
process attempts to correlate the input data to some desired output
prediction or classification. The distinction between
different types of training such as “supervised,”
“unsupervised,” and “reinforcement” learning
generally relates to the level of human intervention in the
training process, and more specifically to the meaning that humans
give to the training data prior to training of the machine learning
system. In supervised learning, the training data set is
carefully analyzed by domain experts to identify features that are
relevant to the desired output of the system, and to label each
sample of the training data set as one of the possible target
predictions of the system. By contrast, unsupervised learning
techniques are described as not providing features and labels for
the input test data in advance, but rather the machine learning
algorithm itself identifies features of input data that permit
segregating of the input samples into unlabeled output groups.
Unsupervised learning techniques are less developed than supervised
learning techniques, which are required to some degree in nearly
all current day implementations of machine learning. For
example, the output of a purely unsupervised learning algorithm may
be used to define the feature set for a supervised learning system
in which experts provide target labels that render the output
groupings meaningful to humans.
The machine learning training phase begins with a collection of
training data from a particular domain in which the machine
learning system will be applied to solve a real-world problem. A
common example would be a training data set including many images
of different types of fruit (apples, oranges, bananas,
etc.). The training data typically includes an identified set
of features shared by all input samples and relevant to the output,
as well as the possible labeled output predictions to be made from
values of the feature set in a particular sample. In the fruit
example, the features may be shape, color size, etc., and the
possible labeled outputs may be apples, oranges, bananas,
etc.
A machine learning algorithm can be thought of as a very generic
mathematical function that roughly correlates any input data with
possible output predictions. When the algorithm is run on
training data, it uses the value of each predefined feature in a
particular sample to associate that sample with one of the
predefined possible predictions that can be made about the
sample. The algorithm then compares its selected prediction to
a human labeled correct prediction for the sample to determine the
accuracy of the algorithm’s prediction. As may be expected
from the generic nature of the mathematical functions that underlie
machine learning algorithms, the error rate on initial samples of
the training data is more akin to a guess than a
prediction. However, all machine learning algorithms have a
set of basic parameters that can be adjusted to improve accuracy in
selecting the correct output prediction. With each erroneous
prediction, the algorithm adjusts one or more of these standard
parameters to lower its error rate.
The above process of guessing, determining error, and adjusting
standard parameters to reduce the error rate is repeated with many
more samples of the training data until the machine learning
algorithm finds the optimal parameters that represent the complex
relationship linking the features of the input data to the correct
output prediction. This optimization performed during the
training process transforms the machine
learning algorithm into a machine
learning model. The model is the thing that is
saved after the training process and deployed in a system to
perform a real-world task. Returning to the fruit example, the
trained model can be deployed in a system that receives new
unlabeled images of any type of fruit and accurately predicts
whether the image is an apple, orange, banana, etc. The model
does this by applying the complex patterns “learned”
during the training phase to the new image data.
Some understanding of the operation of the trained machine
learning model can be gleaned from the familiar visual of an
artificial neural network (ANN), which is a particular type of
machine learning model. ANNs include a complex web of
interconnected computing units called neurons that are grouped into
discrete layers. The connections between neurons of the network are
essentially statistical weighting values that represent the
importance of each of the input features to predicting the correct
output. In complex “deep learning” neural networks, the
feature weighting values are buried within many neuron layers
hidden between input and output layers of the model. Each
sample of input data to the ANN traverses the neural network based
on the feature values of the particular sample and the weighted
connections between the neurons. This traversal results in an
accurate prediction that the particular sample is associated with
one of the target outputs of the system.
Unique Challenges with Disclosing Machine Learning
Inventions
It is often said that machine learning is “automated
programming.” That is, the machine learning algorithm (a human
written program) inputs data from a particular problem domain and
outputs a model (a computer written program) that can solve
problems in that domain. But if machine learning merely
automates the human activity of programming, why would it be more
challenging to disclose machine learning inventions? After all,
manually written software programs have been described in patents
for many decades. A better understanding of these disclosure
challenges can be gained by considering the distinction between
the algorithm and
the model of machine learning systems. At this
point, it should be noted that the terms “algorithm” and
“model” are often used interchangeably throughout the
literature on machine learning systems. For example, the
result of the training process is frequently described as simply a
“trained algorithm.” To facilitate understanding of
the unique disclosure challenges, we adopt the description of
machine learning systems set forth above in which the machine
learning algorithm is transformed into a
trained model during the training
process. This distinction in terminology will also be useful
in Part II of this series because the term “algorithm”
has developed some legal meaning in U.S. patent law, whereas the
term “model” generally has not.
Machine learning algorithms have been the subject of academic
research for many decades. Although the underlying
mathematical functions of machine learning algorithms are quite
complex, the training process is fairly well understood. This
understanding has developed mainly through refinement of the
training process into the granular steps involved in building
relatively simple models having few input and output
variables. Several descriptive tools have evolved to explain
these training steps. For example, academic research papers
often use flow charts, mathematical formulas, and pseudocode to
describe the training steps. Indeed, our understanding has
come to the point where software developers can implement machine
learning algorithms in the source code of many modern programming
languages. Thus, the machine learning algorithm itself is akin
to run-of-the-mill software programs outside of the AI context.
By contrast, the machine learning model is the “black
box” aspect of AI systems. As stated by IBM, the largest
patent filer for AI inventions in the U.S., “AI inventions can
be difficult to fully disclose because even though the input and
output may be known by the inventor, the logic in between is in
some respects unknown.” 2 One reason that the
model is considered a black box is the enormous complexity of
present day models such as deep learning neural networks. As
noted, the model is essentially a large set of numerical
statistical weighting values that represent the complex
interrelatedness of many input features of the training data that
determine the output prediction. However, these numerical
weighting values (which may be produced in a data table, for
example) have little meaning to even experts because the magnitude
and sign of the numerical weightings are randomly produced during
the training process.
In addition, complex models are a relatively recent artifact of
machine learning research and have not themselves undergone much
study. Although the training algorithms that produced
relatively simple ANNs a decade ago were theoretically capable of
building more complex models, it is only through recent advances in
computing power and the availability of large datasets that complex
ANNs with hundreds of hidden layers have become a
reality. Tools are still being developed for understanding the
decision making process that occurs within such deep learning
models. Thus, while machine learning algorithms can be
described with source code precision, the complex models built by
these algorithms are presently described by vague analogy to other
little-understood systems like biological neural
networks.
Finally, some of the precise inner workings of the machine
learning model may simply be beyond the limits of the human brain
to comprehend. These models represent the patterns discovered
by automated iterations through massive amounts of
information. As the number of input features of these models
increases, the feature interrelatedness that provides the path to
an accurate prediction may be undetectable or imperceptible to
humans. Evidence of this can be seen in the area of computer
vision where it is well publicized that machine learning models
have become more accurate than humans in recognizing and sorting
images of some objects.
Possible Enhancements to Disclosure Requirements for Obtaining
AI Patents
The patent policy debate noted at the outset of this post more
precisely centers on enhancing patent disclosure requirements to
mitigate the black box nature of machine
learning models. For example, some scholars have
called for long term legislative changes to establish a data
deposit requirement for training data and/or the machine learning
model itself- akin to the sequence listings or biological material
deposit requirements found in life sciences
patents.3 Others suggest that the USPTO can provide
these and other disclosure enhancements through patent examination
rules and policies.4 However, a recent USPTO report
on AI and patent policy suggests their view that no adjustments are
needed to current disclosure laws or examination
policies.5 Nevertheless, even without legislative
or regulatory changes, courts may interpret existing patent law
doctrines as requiring a greater level of disclosure for AI
inventions. This raises the more immediate practical question of
how patent practitioners can meet current U.S. disclosure
requirements in view of the challenges with describing AI
inventions. In Part II, we will discuss techniques for
drafting AI patents in compliance with the written description and
enablement requirements of 35 U.S.C. §112(a).
Footnotes
1. See WIPO Technology Trends 2019 – Artificial
Intelligence at https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
2. See IBM Comments in Response to “Request for
Comments on Intellectual Property Protection for Artificial
Intelligence Innovation” at https://www.uspto.gov/sites/default/files/documents/IBM_RFC-84-FR-58141.pdf
3. See Artificial Intelligence Inventions &
Patent Disclosure, T Ebrahim, Penn State Law Review, Vol. 125, No.
1, 2020.
4. See Clearing Opacity Through Machine Learning, W.
Nicholson Price II & Arti K. Rai, 106 Iowa L. Rev. 775
(2021).
5. See Public Views on Artificial Intelligence and
Intellectual Property Policy, October 2020 at https://www.uspto.gov/sites/default/files/documents/USPTO_AI-Report_2020-10-07.pdf
The content of this article is intended to provide a general
guide to the subject matter. Specialist advice should be sought
about your specific circumstances.
Comments are closed.