kFOIL: Learning Simple Relational Kernels
Niels Landwehr1 and Andrea Passerini2 and Luc De Raedt1 and Paolo Frasconi2
2Machine Learning and Neural Networks Group
Albert-Ludwigs Universit¨at, Freiburg, Germany
Universit`a degli Studi di Firenze, Florence, Italy
{landwehr,deraedt}@informatik.uni-freiburg.de
All these kernels are fixed before learning takes place and,to the best of the authors’ knowledge, a kernel method that
A novel and simple combination of inductive logic program-
directly learns from relational representations is still miss-
ming with kernel methods is presented. The kFOIL algo-
ing. Second, there is the idea of static propositionalization,
rithm integrates the well-known inductive logic programming
in which an ILP problem is turned into a propositional one
constructed by leveraging FOIL search for a set of relevant
by pre-computing a typically large set of features, cf. e.g.
clauses. The search is driven by the performance obtained
(Muggleton, Amini, & Sternberg 2005), and then using tra-
by a support vector machine based on the resulting kernel.
ditional SVM learning on the resulting representation. An
In this way, kFOIL implements a dynamic propositionaliza-
extension of this approach transforms the relational repre-
tion approach. Both classification and regression tasks can be
sentations into a structured one, by e.g. computing proof-
naturally handled. Experiments in applying kFOIL to well-
trees for so-called visitor programs (Passerini, Frasconi, &
known benchmarks in chemoinformatics show the promise
De Raedt 2006). Third, as kernels are closely related to sim-
ilarity measures, work on distance based relational learning(Ramon & Bruynooghe 1998; Kirsten, Wrobel, & Horv´ath
2001) should also be mentioned. The drawback of these ap-proaches is that the resulting models are still complex and
Various successes have been reported in applying inductive
hard to interpret. In addition, the user typically needs to
logic programming (ILP) techniques to challenging prob-
specify additional information to restrict the number of fea-
lems in bio- and chemoinformatics, cf. e.g. (Bratko & Mug-
tures generated in the propositionalization process or to en-
gleton 1995). These successes can—to a large extent—be
code the distance function, which is often a non-trivial task.
explained by the use of an expressive general purpose repre-sentation formalism that allows one to deal with structured
The approach taken in this paper is different. The key
data, to incorporate background knowledge in the learning
idea is to dynamically induce a small set of clauses us-
process, and to obtain hypotheses in the form of a small set
ing a FOIL-like covering algorithm (Quinlan 1990) and to
of rules that are easy to interpret by domain experts.
use these as features in standard kernel methods. Apply-
On the other hand, support vector machines and kernel
ing rule-learning principles leads to a typically small set of
methods in general have revolutionized the theory and prac-
rules or features, which are—due to the use of a relational
tice of machine learning in the past decade. These methods
representation—also easy to interpret. Using these features
do not only yield highly accurate hypotheses; they are also
to define a kernel leads to similarity measures amongst re-
grounded in a solid mathematical theory. However, dealing
lational examples and also allows to directly tackle a wide
with structured data and employing background knowledge
variety of learning tasks including classification and regres-
is harder, as it typically requires one to develop a novel ker-
sion with support vector machines. Especially the uniform
nel for the specific problem at hand, which is a non-trivial
treatment of classification and regression is appealing from
task. Also, the resulting hypotheses are hard to interpret by
an ILP perspective, as these typically require rather differ-
ent techniques (with possibly the exception of decision trees
Given these developments, it can be no surprise that sev-
(Kramer 1996)).In contrast to the three types of approaches
eral researchers have started to combine and integrate ideas
mentioned earlier, the kernel or similarity measure is being
from ILP with those from support vector machines. First,
learned. Also, whereas the resulting model is still a kind
there has been a significant interest in developing kernels for
of propositionalization, the features are learned dynamically
structured data, cf. (Gaertner 2003) for an overview, in par-
and not pre-computed in advance. Thus a dynamic propo-
ticular for sequences, trees, graphs, and even individuals de-
sitionalization technique results, which is similar in spirit to
scribed in high-order logic (Gaertner, Lloyd, & Flach 2004).
the nFOIL system (Landwehr, Kersting, & De Raedt 2005),a method that combines FOIL with na¨ıve Bayes and proved
Copyright c 2006, American Association for Artificial Intelli-
to yield significant improvements over traditional ILP meth-
gence (www.aaai.org). All rights reserved.
ods such as Aleph (an ILP system developed by Ashwin
Srinivasan 1) on a number of benchmark problems.
form K(e1, e2, H, B). As the background theory B is fixed
The above sketched idea has been incorporated in the
throughout the whole learning process, we will from now
kFOIL algorithm and has been elaborated for classification
on omit this argument from the notation. The function K
plays a role similar to that of the distances between first-
ated experimentally on a number of well-known benchmark
order logic objects used in relational learning (Ramon &
Bruynooghe 1998; Kirsten, Wrobel, & Horv´ath 2001). Asupport vector machine will then be used in combination
with the kernel K to define the f (e, H, B) function.
We start from an inductive logic programming perspective
and then extend it towards the use of kernels.
K(e1, e2, H), it is convenient to first propositionalize the ex-
amples e1 and e2 using H and B and then to employ existing
Traditional ILP approaches tackle the following problem:
kernels on the resulting problem. The natural way of doingthis, is to map each example e onto a vector ϕH (e) over
{0, 1}n with n = |H|, having ϕH(e)i = 1 if B ∪ {ci} |= e
• a background theory B, in the form of a set of definite
for the i-th clause ci ∈ H, and 0 otherwise.
clauses, i.e., clauses of the form h ← b1, · · · , bk where h
Example 1 Consider the following background theory B,
which describes the structure of molecules:
• a set of examples E in the form of ground facts of an un-
known target function y; y maps examples to {+1, −1}
atm(m1, a1 1, c, 22, −0.11) bond(m1, a1 1, a1 2, 7)
(denoting {true, f alse}) in a classification setting, or al-
ternatively to R, the reals, in a regression setting;
atm(m1, a1 26, o, 40, −0.38) bond(m1, a1 18, a1 26, 2)
• a language of clauses L, which specifies the clauses that
atm(m2, a2 1, c, 22, −0.11) bond(m2, a2 1, a2 2, 7)
• a f (e, H, B) function, which returns the value of the hy-
pothesis H on the example e w.r.t. the background theoryB;
atm(m2, a2 26, o, 40, −0.38) bond(m2, a2 18, a2 26, 7)
• a score(E, H, B) function, which specifies the quality of
the hypothesis H w.r.t. the data E and the background
pos(X) ← atm(X, A, c, 22, C), atm(X, B, E, 22, 0.02)
In a classification setting, the goal typically is to find a
complete and consistent concept-description, i.e., a set of
pos(X) ← atm(X, A, c, 27, C), bond(X, A, B, 2)
clauses that cover all positive and no negative examples.
H as a logical theory covers both examples. Clauses c
This can be formalized within our framework by making the
succeed on the first example and clauses c
following choices for f (e, H , B ) and score:
ond. Consequently, in the feature space spanned by the truth
• f (e, H, B) = +1 if B ∪ H |= e (i.e., e is entailed by
values of the clauses, the examples are represented as
score(E, H, B) = training set accuracy.
In a regression setting, the goal is typically to find
a hypothesis H that minimizes a measure such as theroot mean squared error between the target y(e) and the
Let us now look at the effect of defining kernels on the
propositionalized representation. A simple linear kernel KL
Let us now show how kFOIL can be formulated within the
above sketched definition of inductive logic programming.
The notions of examples, language, hypotheses and back-ground theory remain essentially the same. However, it is
The resulting kernel KL can be interpreted as the number of
extended by a notion of similarity between pairs of exam-
clauses in H that succeed on both examples.
ples e1,e2 that is defined—as for other kernel methods—
Let us formalize the linear kernel introduced in the above
by a kernel function. ¿From an ILP point of view, this
should take into account the hypothesis H and the back-ground theory B. Thus kFOIL requires a kernel K of the
where #entH (f ) = |{c ∈ H|B ∧ {c} |= f }| denotes the
http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/
number of clauses in H that together with B logically entail
f . Intuitively, this implies that two examples are similar if
they share many structural features. Which structural fea-
tures to look at when computing similarities is encoded in
This formalism can be generalized to standard polynomial
(KP ) and Gaussian (KG) kernels. Using a polynomial ker-
nel, the interpretation in terms of logical entailment is
P (e1, e2, H ) = (#entH (e1 ∧ e2) + 1)p,
let c be the c ∈ ρ(c) with the best score
which amounts to considering conjunctions of up to p
clauses which logically entail the two examples, as can eas-
ily be shown by explicitly computing the feature space in-
duced by the kernel. Using a Gaussian kernel turns out to
The generic FOIL algorithm is sketched in Algorithm 1.
where the argument of entH can be interpreted as a kind of
It repeatedly searches for clauses that score well with respect
symmetric difference between the two examples.
to the data set and the current hypothesis and adds them to
the current hypothesis. The examples covered by a learned
over examples in a propositional representation, we only
clause are removed from the training data (in the update
need to employ them within traditional support vector ma-
function). In the inner loop, it greedily searches for a clause
chine methods to obtain effective classification and regres-
that scores well. To this aim, it employs a general-to-specific
hill-climbing search strategy. Let p(X1, ., Xn) denote the
For instance, using the standard support vector method for
predicate that is being learned (e.g, pos(X) for a simple clas-
classification, the f (e, H , B ) function is expressed as
sification problem). Then the most general clause, whichsucceeds on all examples, is ”p(X1, ., Xn) ←”. The set
of all refinements of a clause c within the language bias is
produced by a refinement operator ρ(c). For our purposes, a
refinement operator just specializes a clause h ← b1, · · · , bk
where {e1, ., em} are the training examples and y(ei) =
by adding a new literal bk+1, though other refinements have
1 if ei is a positive example and y(ei) = −1 otherwise.
also been used in the literature. This type of algorithm has
Similarly, using support vector regression one obtains
been successfully applied to a wide variety of problems in
ILP. Many different scoring functions and stopping criteria
The search in kFOIL follows the generic search strategy
outlined in Algorithm 1. However, there are three key dif-
obtained from the theory H using standard support vector
ferences, which will now be outlined. First, when scoring
a refined clause, a support vector machine based on the cur-
By now, we have formally specified the learning setting
rent kernel including the clause has to be built and its perfor-
addressed by kFOIL. It is the instantiation of the standard
mance must be evaluated on the training data. This can be
ILP problem sketched earlier with the f (e, H, B) function
achieved by introducing a loss function V (y(e), f (e)) that
just defined. As scoring functions, kFOIL employs train-
measures the cost of predicting f (e) when the target is y(e).
ing set accuracy for classification and Pearson correlation
Thus score(E, H ∪ {c }, B) is computed in a ”wrapper”
or root mean squared error for regression. The key point
is that kFOIL—as standard inductive logic programming
(α1, ., αm, b) := train svm(E, H ∪ {c }, B)
techniques—must find the right hypothesis H that maxi-
mizes its score. Note that this approach differs significantly
from the static propositionalization approaches, where H
is actually pre-computed and fixed. As kFOIL learns the
hypothesis H, this implies that the kernel itself is being
Here train svm(E, H, B) trains a support vector machine
using the kernel defined by H, while f (e, H, B) computesthe prediction according to Equation 1 or Equation 2 for the
classification or regression case respectively.
To learn H, kFOIL employs an adaptation of the well-known
Second, kFOIL cannot use a separate-and-conquer ap-
FOIL algorithm (Quinlan 1990), which essentially imple-
proach. Because the final model in FOIL is the logical dis-
ments a separate-and-conquer rule learning algorithm in a
junction of the learned clauses, (positive) examples that are
already covered by a learned clause can be removed from the
training data (in the update(E, H) function in Algorithm 1).
(686 examples), low toxicity (886 examples), high acetyl
In kFOIL, this notion of coverage is lost, and the training set
cholinesterase inhibition (1326 examples), and good rever-
is not changed between iterations. Therefore, update(E, H)
sal of memory deficiency (642 examples).
returns E. Finally, FOIL stops when it fails to find a clause
The NCTRER dataset has been extracted from the EPA’s
that covers additional positive examples. As an equally sim-
DSSTox NCTRER Database (Fang et al. 2001). It con-
ple stopping criterion, learning in kFOIL is stopped when
tains structural information about a diverse set of 232 nat-
the improvement in score between two successive iterations
ural, synthetic and environmental estrogens and classifica-
tions with regard to their binding activity for the estrogen
The repeated support vector optimizations performed dur-
receptor. Again, we used atom and bond information only.
ing the search are computationally expensive. However, the
In the Biodegradability domain (Blockeel et al. 2004) the
costs can be reduced with simple tabling techniques, and
task is to predict the biodegradability of 328 chemical com-
by exploiting the fact that the relational example space is
pounds based on their molecular structure and global molec-
mapped to a much simpler propositional space by ϕh. There,
ular measurements. This is originally a regression task, but
different relational examples are represented by the same
can also be transformed into a classification task by putting
vector, and can be merged to one example with a higher
weight. In our experimental study, this typically reduced
On Mutagenesis, Alzheimer, and NCTRER, kFOIL was
the time needed to learn a model by one to two orders of
compared to nFOIL, the state-of-the-art ILP system Aleph
and a static propositionalization approach. We used a variant
In a preliminary evaluation, we compared alternative
of the relational frequent query miner WARMR (Dehaspe,
scores to guide FOIL search, including kernel target align-
Toivonen, & King 1998) for static propositionalization as
ment (Lanckriet et al. 2004) and various loss functions V
WARMR patterns have shown to be effective propositional-
in the wrapper-style score algorithm above (hinge loss, 0-
ization techniques on similar benchmarks in inductive logic
1 loss, margin-based conditional likelihood). Kernel target
programming (Ashwin Srinivasan 1999). The variant used
alignment does not require SVM training but the speedup is
was c-ARMR (De Raedt & Ramon 2004), which allows to
marginal due to the inherent cost of FOIL and the optimiza-
remove redundancies amongst the found patterns by focus-
tions outlined above. In addition, local optima problems oc-
ing on so-called free patterns. c-ARMR was used to gener-
curred in conjunction with greedy search. 0-1 loss for clas-
ate all free frequent patterns in the data sets where the fre-
sification and quadratic loss for regression yielded the most
quency threshold was set to 20%. We used at most 5000 of
stable search results and were employed in the experiments
the generated patterns as features to generate (binary) propo-
reported below. These criteria are known to be associated
sitional representations of the datasets. On the proposition-
with the risk of overfitting in the case of propositional fea-
alized datasets, a cross-validation of a support vector ma-
ture selection (Kohavi & John 1997). However, the use of
chine was then performed2. To evaluate the regression per-
independent data—e.g. by using a leave-one-out estimated
formance of kFOIL, we reproduced the experimental setting
loss as suggested in (Reunanen 2003)—would increase com-
used in (Blockeel et al. 2004) and compared to the results
plexity significantly and the more efficient approach of esti-
obtained in that study for Tilde and S-CART.
mating leave-one-out bounds resulted in unstable search.
As the goal of the experimental study was to verify that
the presented approach is competitive to other state-of-the-
art techniques, and not to boost performance, we did not
try to specifically optimize any parameter. For nFOIL, we
propositionalization approach developed in kFOIL:
used the default settings: maximum number of clauses ina hypothesis was set to 25, maximum number of literals in
(Q1) Is kFOIL competitive with state-of-the-art inductive
a clause to 10 and the threshold for the stopping criterion
logic programming systems for classification?
to 0.1%. For kFOIL, we used exactly the same parameters.
(Q2) Is kFOIL competitive with state-of-the-art inductive
For both algorithms, a beam search with beam size 5 instead
logic programming systems for regression?
of simple greedy search was performed, as in (Landwehr,Kersting, & De Raedt 2005). Furthermore, a polynomial
(Q3) Is kFOIL competitive with other dynamic proposition-
kernel of degree 2 was used, the regularization constant C
alization approaches, in particular to nFOIL?
was set to 1 for classification and 0.01 for regression, and
(Q4) Is kFOIL competitive with static propositionalization
tube parameter was set to 0.001. All SVM parameters
were set identical for all datasets, and kept fixed during thesearch for clauses.
We conducted experiments on nine benchmark datasets
Table 1 shows cross-validated predictive accuracy results
1996) the problem is to predict the mutagenicity of a
on Mutagenesis, Alzheimer, and NCTRER. Both kFOIL
set of compounds We used atom and bond information
and nFOIL on average yield higher predictive accuracies
For Alzheimer (King, Srinivasan, & Sternberg
1995), the aim is to compare four desirable properties of
2Note that this methodology puts this approach at a slight ad-
drugs against Alzheimer’s disease: inhibit amine reuptake
vantage and might yield over-optimistic results.
Table 1: Average predictive accuracy results on Mutagenesis, Alzheimer and NCTRER for kFOIL, nFOIL, Aleph and staticpropositionalization. On Mutagenesis r.u. a leave-one-out cross-validation was used (which, combined with the small size ofthe dataset, explains the high variance of the results), on all other datasets a 10 fold cross-validation. • indicates that the resultfor kFOIL is significantly better than for other method (paired two-sided t-test, p = 0.05).
Regression: root mean squared errorBioDeg Global + R
Table 2: Result on the Biodegradability dataset. The results for Tilde and S-CART have been taken from (Blockeel et al. 2004). 5 runs of 10 fold cross-validation have been performed, on the same splits into training and test set as used in (Blockeel et al. 2004). For classification, average accuracy is reported, for regression, Pearson correlation and RMSE. • indicates that the resultfor kFOIL is significantly better than for other method (unpaired two-sided t-test, p = 0.05).
than the ILP system Aleph and static propositionalization.
kFOIL significantly outperforms nFOIL on two datasets,
← atm(B, o), bd atm(B, C, c, −), bd atm(C, D, c, =),
and a Wilcoxon Matched Pairs Test applied to the results
bd atm(C, E, c, −), bd atm(E, F, c, =),
of kFOIL and nFOIL on the different datasets shows that
bd atm(G, D, c, −), bd atm(F, H, I, −).
kFOIL reaches significantly higher predictive accuracy onaverage (p=0.05). These results affirmatively answer ques-
It encodes an aromatic ring with a phenol group (a so-called
Table 2 shows results for the Biodegradability dataset. For
regression, we ran kFOIL with scoring based on correlationand root mean squared error, and measured the result usingthe corresponding evaluation criterion. The results obtained
show that kFOIL is competitive with the first-order decisiontree systems S-CART and Tilde for classification. For re-
gression, it is competitive at maximizing correlation, andslightly superior at minimizing RMSE. Thus, question Q4
In the study presented in (Fang et al. 2001), the presence
can be answered affirmatively as well.
of a phenolic ring is identified by human experts as one ofthe main factors that determine estrogen-binding activity of
kFOIL returned between 2.8 and 22.9 clauses averaged
over the folds of the cross-validation, depending on thedataset. Interestingly, the number of clauses in H was al-ways lower than for nFOIL. On the datasets we examined,
building a kFOIL model takes up to 10 minutes for classi-
We have presented the kFOIL system, which introduces a
fication, and up to 30 minutes for regression. This is of the
simple integration of inductive logic programming meth-
same order of magnitude as the runtime for the other systems
ods with support vector learning. kFOIL can be consid-
ered a propositionalization approach. Two types of propo-
Finally, we give an example of a learned clause which
sitionalization approaches have been discussed: static ones,
is meaningful to human domain experts: on the NCTRER
in which a typically large set of features is pre-computed,
and dynamic propositionalization, in which features are in-
Dehaspe, L.; Toivonen, H.; and King, R. 1998. Finding
crementally and greedily generated. As the generation of
Frequent Substructures in Chemical Compounds. In Proc.
clauses is driven by the performance of the support vec-
tor machine, kFOIL performs dynamic propositionaliza-
Fang, H.; Tong, W.; Shi, L.; Blair, R.; Perkins, R.; Bran-
Hence, kFOIL is related to Support Vector Induc-
ham, W.; Hass, B.; Xie, Q.; Dial, S.; Moland, C.; and Shee-
tive Logic Programming,which combines static proposition-
han, D. 2001. Structure-Activity Relationships for a Large
alization with support vector learning, and systems like
Diverse Set of Natural, Synthetic, and Environmental Es-
SAYU (Davis et al. 2005), nFOIL, and Structural Logis-
trogens. Chemical Research in Toxicology 14(3):280–294.
tic Regression (Popescul et al. 2003), which all combine
Gaertner, T.; Lloyd, J.; and Flach, P.
dynamic propositionalization with probabilistic models. In
contrast, kFOIL employs kernel based learning, which al-
lows to tackle classification and regression problems in auniform framework. Also, kFOIL improved upon nFOIL
Gaertner, T. 2003. A Survey of Kernels for Structured
in terms of predictive accuracy in our experimental study.
Data. SIGKDD Explorations 5(1):49–58.
From a kernel machine perspective, kFOIL can also be
King, R.; Srinivasan, A.; and Sternberg, M. 1995. Relat-
seen as constructing the kernel based on the available data
ing Chemical Activity to Structure: an Examination of ILP
and therefore it has interesting connections to methods that
Successes. New Generation Computing 13(2,4):411–433.
attempt to learn the kernel from data.
Kirsten, M.; Wrobel, S.; and Horv´ath, T. 2001. Distance
(Lanckriet et al. 2004) works in the transductive setting
based approaches to relational learning and clustering. In
(input portion of the test data available when training) and
Relational Data Mining, 213–230. Springer.
uses a semidefinite programming algorithm for computing
Kohavi, R., and John, G. 1997. Wrappers for feature subset
the optimal kernel matrix. Algorithms for learning the ker-
selection. Art. Int. 97(1–2):273–324.
nel function include the idea of using a hyperkernel (that
Kramer, S. 1996. Structural Regression Trees. In Proc. of
spans a Hilbert space of kernel functions) (Ong, Smola, &
Williamson 2002) and the use of regularization function-als (Micchelli & Pontil 2005). These approaches are typ-
Lanckriet, G. R. G.; Cristianini, N.; Bartlett, P.; Ghaoui,
ically more principled than kFOIL (as they learn the ker-
L. E.; and Jordan, M. I. 2004. Learning the Kernel Ma-
nel by solving well-posed optimization problems). However
trix with Semidefinite Programming. J. Mach. Learn. Res.
the formulation by which the kernel is obtained as a convex
combination of other kernel functions would be difficult or
Landwehr, N.; Kersting, K.; and De Raedt, L.
impossible to apply in the context of dynamic feature con-
nFOIL: Integrating Na¨ıve Bayes and FOIL. In Proc. of
struction in a fully-fledged relational setting. Furthermore,
to the best of the authors’ knowledge, no other method pro-
Micchelli, C. A., and Pontil, M. 2005. Learning the Kernel
posed so far can learn kernels defined by small sets of inter-
Function via Regularization. J. Mach. Learn. Res. 6:1099–
Acknowledgements The authors would like to thank Kris-
Muggleton, S.; Amini, A.; and Sternberg, M. 2005. Sup-
tian Kersting and the anonymous reviewers for valuable
port Vector Inductive Logic Programming.
comments. The research was supported by the European
Union IST programme, contract no. FP6-508861, Applica-
Ong, C. S.; Smola, A. J.; and Williamson, R. C. 2002.
tion of Probabilistic Inductive Logic Programming II.
Hyperkernels. In NIPS 15. Passerini, A.; Frasconi, P.; and De Raedt, L. 2006. Kernels
on prolog proof trees: Statistical learning in the ILP setting.
Ashwin Srinivasan, Ross D. King, D. B. 1999. An As-
sessment of ILP-Assisted Models for Toxicology and the
Popescul, A.; Ungar, L.; Lawrence, S.; and Pennock, D.
PTE-3 Experiment. In Proc. of ILP’99.
2003. Statistical Relational Learning for Document Min-
Blockeel, H.; Dzeroski, S.; Kompare, B.; Kramer, S.;
ing. In Proc. of ICDM’03, 275–282.
Pfahringer, B.; and Laer, W. 2004. Experiments in Pre-
Quinlan, J. 1990. Learning Logical Definitions from Rela-
dicting Biodegradability. Appl. Art. Int. 18(2):157–181.
tions. Machine Learning 5:239–266.
Bratko, I., and Muggleton, S. 1995. Applications of Induc-
Ramon, J., and Bruynooghe, M. 1998. A Framework for
tive Logic Programming. Comm. of the ACM 38(11):65–
Defining Distances Between First-Order Logic Objects. In
Davis, J.; Burnside, E.; de Castro Dutra, I.; Page, D.; and
Reunanen, J. 2003. Overfitting in making comparisons
Costa, V. S. 2005. An Integrated Approach to Learning
between variable selection methods. J. Mach. Learn. Res.
Bayesian Networks of Rules. In Proc. of ECML’05, 84–
Srinivasan, A.; Muggleton, S.; King, R.; and Sternberg, M.
De Raedt, L., and Ramon, J. 2004. Condensed Representa-
1996. Theories for Mutagenicity: a Study of First-Order
tions for Inductive Logic Programming. In Proc. of KR’04.
and Feature-Based Induction. Art. Int. 85:277–299.

Personal Information Instructions Please fill out this form as completely as you can. Print your answers . Today's Date Marital Status First Name Ethnicity Last Name Gender [ ] Male [ ] Female Date of Birth Occupation Address and Please give your home address. Please indicate by circling the appropriate letter whether I can leave a fu

CULTURA ESCOLAR E PRÁTICAS EDUCACIONAIS NA REFORMA FRANCISCO CAMPOS: A PARTICIPAÇÃO DOS/AS PROFESSORES/AS Este artigo pretende analisar as repercussões de uma reforma educacional realizada em Minas Gerais em 1927 no cotidiano das escolas públicas estaduais, especialmente a partir da percepção dos/as professores/as submetidos/as ao processo e responsáveis por transformar as ori