CF-induction (ver. 0.45)

CF-induction is a sound and complete ILP (Inductive Logic Programming) method for finding hypotheses in full-clausal theories.

This page explains how to use the latest version (ver. 0.45) of CF-induction.  

1. Introduction

Given a background theory B, examples E, and mode declarations M
CF-induction generates a hypothesis H such that 

A)  BH entails E;
B)  BH is consistent;
C)  H belongs to the mode language L(M)  

B,  E  and H  are represented as a full-clausal theory that can include non-Horn clauses.

L(M) is a language bias that syntactically describe the target hypotheses to be found.
This bias is introduced in many modern ILP systems like Progol (or Aleph).

The version 0.45 can find a relevant hypothesis H based on a heuristic search strategy 
(See the paper [Y. Yamamoto et al, ILP2012] for more detail on the search algorithm). 

The system is currently designed is as follows:

Currently, it has been implemented by JAVA, and is available from the below

2. How to use the software

If you have already installed Java SE, then the software will work with the following command:

% cd CF-induction_0.45

% java ki.CF problem/1jml2004/ex1.txt

If it does not work, please check this page for trouble shooting too.

Please set the path to the bin directory in the Java SDK package so as to use the java command, and then move the current directory to CF-induction0.45b. 

If the software successfully work, then we have the following output

=================== (from here) =======================

  Executing time (step: loading the problem) : 21[msec]
>> Dualizaing observations
>> Computing the new characterisitc clauses 
>> Computing the characteristic clauses
>> Computing the positive bottom theory
>> Computing the negative bottom theory
   Executing time (step: computing two bottom theories): 497 [msec]
>> Computing the hypotheses
Hypo index: 0
-- Current hypothesis
[mortal(X_0), -human(X_0)]

   Executing time (step: generalization): 11 [msec]
   Executing time (total): 1643 [msec]
=================== (end here) =======================

This example (problem/1jml2004/ex1.txt) means that... 
given the background theory  B = human(s), examples E = mortal(s),
and the mode declarations M = { modeh(mortal(+life)), modeb(1, human(+life)) },
Then, CF-induction has computed the hypothesis H =  human(X) ⊃ mortal(X).

Please also see examples under the ``problem'' directory and solve them by CF-induction.
(The desirable command is written in each example file,  so, please try that command.)

3. How to write input files for the software

The input file mainly consists of three parts: the one is for representing the background theory,
the second is for representing examples, and the third is for representing the language bias.

3.1 Background theory  B

Let C be a clause odd(X) ⊃ even(s(X)). Then C is written as follows:

cnf(c, bg, [-odd(X), even(X), successor(X, Y)]).

The first term ``c'' denotes the name of the clause C. We can write it arbitrarily. 
The second term ``bg'' denotes that this clause is included in B. Note that any function symbol does not currently allowed due to the computational efficiency. Instead, we can add some new predicate which has the same meaning of the function. 
Note also that our representation formalisms are not limited to Horn theories).

3.2 (Positive) Examples E

Let E be a clause odd(3). Then E is written as follows:

cnf(e, obs, [odd(3)]).

The first term ``e'' denotes the name of the clause E
The second term ``obs'' denotes that this clause is included in the positive examples.
Note here that if you need to treat some negative examples like -odd(2), 
then it is enough to make them included in the background theory. 

3.3 Language bias 

Currently, we can use 3 types of the language bias in CF-induction as follows.

(A) Mode declarations:  
The syntax of them is the same as the original one in Progol-like ILP systems. 
Mode declarations are used to syntactically describe the target hypotheses.

modeh(1, odd(+number)).
modeb(1, even(-number)).
modeb(1, succesor(+number, -number)).

(B) Type: 

This bias is used to define the type of each ground term like the below example:


type(0, number).
type(1, number).
type(2, number).
type(3, number).
type(4, number).
type(5, number).

(C) Induction field:

This defines the ground literals that can be included in the ground hypotheses.
Usually, we do not need to write the induction field, since the software automatically
compute this from the mode declarations and types. But, due to the computational efficiency,
we often need to restrict the ground literals explicitly (See problem/6plus/plus.txt for instance)

Besides, we can declare the maximal size and length of clauses in the target hypothesis as follows:



length[0:4], size[0:3]
+-plus(1, 1, 2),
+-plus(1, 2, 3),
+-plus(2, 1, 3),
+-plus(1, 3, 4),
+-plus(3, 1, 4),
+-plus(1, 4, 5),
+-plus(4, 1, 5),
+-plus(2, 2, 4),
+-plus(2, 3, 5),
+-plus(3, 2, 5)

In this induction field, we find a hypothesis which maximally consists of 3 clauses 
(by size[0:3]) each of which has clauses not greater than 4 (by length[0:4]).  

4. How to run the experiments by CF-induction

Please see the ReadMe.txt file in the experiments directory.

Yoshitaka Yamamoto,
Mar 17, 2013, 9:59 PM