EP 3 686 882 A1 filed as PCT/CN2018/092114 relates to a filtering model training method, a speech recognition method, a training device, a speech recognition device, and a speech recognition system.
Brief outline of the case
The application was refused for lack of clarity of the claims of the MR aa well as AR1-3.
The applicant appealed the refusal.
In a communication under Art 15(1) RPBA, the board laid out its preliminary opinion, according to which the skilled person would understand the term “syllable” according to its well-established meaning, as defined in paragraph [0069] of the application.
In its reply, the applicant explained that the board’s understanding was not correct, as what is referred to as syllables differs between languages.
The Board was thus led to change its opinion, meaning the ED might well have been correct in finding that the inconsistent use of the term “syllable” in the description rendered claim 1 unclear. However, in view of the other objections, this issue could be left unsolved.
Eventually, the board confirmed the refusal.
The applicant’s point of view
The applicant argued that the core of the invention was to use a filtering model before sending the recorded speech signals to the speech recognition engine, and to train the filtering model based on syllable distances such that the output of the speech recognition engine was closer to the known text of the corpus.
The details of the filtering model, and of the scoring model that was also used in the training, were not what the invention was about. Accordingly, their realization was left to the technical knowledge of the skilled person.
The applicant added that the application provided sufficient information on both the filtering and the scoring model. For example, Figure 5 and paragraph [0196] of the application illustrated the nature of the models and their interrelation.
The skilled person would understand that the N syllable distances represented an output of the filtering model. The N syllable distances were also used to determine the scoring model.
Therefore, the K syllable distances, having been determined based on the scoring model, represented a measure of success of the filtering model. The claim did not restrict the training of the filtering model to one based only on the N syllable distances. Rather, it could also be based on other things, like the K syllable distances.
The applicant concluded that the skilled person would understand how to realize a suitable filtering model and scoring model, and there was no need for a more detailed definition in claim 1.
The board’s decision
The board shared the ED’s view that the nature of the filtering model and its training are unclear in claim 1.
The skilled person understands that the fact that the filtering model is trained implies that it is subject to machine learning; hence, that training data are fed to the model, and that the output, after being further processed by the speech recognition engine, is compared to the correct results, assumed to be known, a measure of success is determined, and certain variables of the model are repeatedly adjusted in order to maximize the success, i.e. minimize the error.
Claim 1 implies that the trained model will be used to filter an unknown sound signal before actual speech recognition is performed on the filtered signal. The training is expected to alter the model in such a way that its trained version leads to improved speech recognition.
A model subjected to machine learning does not necessarily require a definition of its nature, or of the exact training process, if CGK allows the skilled person to understand which model to use and how to train it. For example, the skilled person might know what kinds of model respond to training with environmental noise data.
The present case is different in that the training is based on syllable distance information; for example, the information that syllable “ka” might have a large syllable distance from its recognized counterpart “ke”.
Here, without further explanation, it is not clear what kind of filtering model could be trained based on such information, and how such training could be performed in order to modify the model to improve the subsequent speech recognition.
The steps of the “filtering model training method” defined in claim 1 contradict the assumptions of the skilled person regarding the training, and further obscure the nature of the filtering model and its training.
The board also came to the conclusion that there are clarity problems relating to sub-steps 4a, 4b, and 4c of the method.
The unclear training steps, including the unclear nature and role of the scoring model, are not suited to defining the nature of the filtering model and its training, which appears to be essential to properly define the invention.
Hence, the ED was right in finding that the subject-matter of claim 1 is not clear.
Comments
As we are in examination it could be argued that essential features allowing the result to be achieved were missing.
In opposition, this would not have been possible, but Art 83 could have applied mutatis mutandis. Information is manifestly missing in the application as filed and this lack of information cannot be compensated with CGK and once added, would result in added matter.
After T 0048/24, commented in he present blog, a new reminder to applicants/proprietors that merely speaking about machine learning is not really sufficient. At least one example should be given in the description.
Comments
Leave a comment