[Roxygen-devel] Parsing roxgen blocks

Hadley Wickham <hadley at rice.edu>

[splitting out into separate emails instead of one huge one]

Yeah, that's the easy part ;) Now how do you find which comment block
they were associated with?
As far as I can see, there's no other way to do this apart from
parsing the call. I also don't see another way to document user
created functions that modify the global state - e.g. add_roccer in
roxygen3 (this wouldn't be necessary if we used S4, but the principle
remains).

HW> Could you flesh out this example a bit more? I don't understand why
HW> you'd want to document objects that aren't evaluated by the user.

Ah sorry, that was stupid. I meant
eval({ a <- generate_object_a()
b <- generate_object_b()})
Roxygen can make a convention if two declarations are followed
imidiately after each over they souled be documented in the same
foo <- function(a) ..
boo <- function(a) ..
will put both foo and boo in the same file. Curently one needs two
documentation blocks and rdname tag if I am not mistaken.

Again, the challenge is to connect the objects to the roxygen block.
Without parsing the call, how do you know that the comment block
should be attached to a and b?
One option would be to evaluate each source block as you encounter it.
That would resolve the problem above (and would make it possible to do
`my_class <- setClass` and still know that it was creating a class).
The problem is that the S4 functions don't seem to be very good at

Precisely what I had in mind (I was convinced that roxygen is already
doing that and didn't bother to explain).

Each code chunk should be evaluated in a new environment. Then
environment is inspected and all the new objects/classes/methods are
returned.

There might be an environment (or list) in roxygen namespace holding all
the object guessers (roxy_env_inspectors?). Each of them is called and
should return a list of new objects detected.

The basic one just looks for normally assigned objects,
roxy_env_inspector.S3 looks for S3 tables *in* the evaluation
environment, roxy_env_inspector.S4_classes looks in class table and gets
the classes defined, roxy_env_inspector.S4_methods searches the S4
table.

If a package declares some wiredo side effect initialization (which is
pretty rare) it should define roxy_env_inspector.Wierdo_thing and add it
to roxy_env_inspectors environment (which should not be sealed in the
package).

a <- setMethod("plot", "numeric", function(x, ...) {})
str(a)

chr "plot"
Which means you'd have to compare states of all the S4 class/method
tables before and after each call. That seems slow and error prone to
me.

Each call to meta functions setClass, setMethod etc creates a table *in*
the evaluation env, if it is a top environment (environment should be
explicitly designated as a top environment for this to work).

Indeed, as far as I can remember, S4 methods an generics are also copied
to the central table for efficiency, but roxygen need not care about
those, local tables are enough.

Post by Hadley Wickham
If you wanted to write some code to do it, I'd definitely be happy to
consider it, but given that my current system works, I'm not in a big
hurry to rewrite it.

Ok, I will wrap up a proof-of-the-concept code.

Vitalie

Hadley Wickham

2012-08-29 13:54:57 UTC

Post by Hadley Wickham
One option would be to evaluate each source block as you encounter it.
That would resolve the problem above (and would make it possible to do
`my_class <- setClass` and still know that it was creating a class).
The problem is that the S4 functions don't seem to be very good at

So you have to evaluate everything twice? Or do you copy over the
results once you've done run it? (Is that even possible for S4?)
Otherwise, how do you make sure that the code actually runs?

i.e. how do you evaluate this code?

a <- 1
b <- 2
a + b

That approach also starts to get complicated with S4 because it's
picky about how you create the environment.

Post by Vitalie Spinu
There might be an environment (or list) in roxygen namespace holding all
the object guessers (roxy_env_inspectors?). Each of them is called and
should return a list of new objects detected.
The basic one just looks for normally assigned objects,
roxy_env_inspector.S3 looks for S3 tables *in* the evaluation
environment, roxy_env_inspector.S4_classes looks in class table and gets
the classes defined, roxy_env_inspector.S4_methods searches the S4
table.

I just don't see the big advantage of this over parsing the call.
This approach will be much more expensive because now you have to run
multiple tests after every single expression. (Also I'm pretty sure
your approach for S3 won't work, because those S3 tables are created
by namespace definitions, not by evaluating a function)

Post by Vitalie Spinu
If a package declares some wiredo side effect initialization (which is
pretty rare) it should define roxy_env_inspector.Wierdo_thing and add it
to roxy_env_inspectors environment (which should not be sealed in the
package).

So now you are defining your own object system ;) Why wouldn't should
these functions just be single method classes that inherit from
RoxyDetector or similar?

a <- setMethod("plot", "numeric", function(x, ...) {})
str(a)

chr "plot"
Which means you'd have to compare states of all the S4 class/method
tables before and after each call. That seems slow and error prone to
me.

I have had many problems trying to do this right for devtools, so I'm
not so sure it's that simple.

Post by Hadley Wickham
If you wanted to write some code to do it, I'd definitely be happy to
consider it, but given that my current system works, I'm not in a big
hurry to rewrite it.

Ok, I will wrap up a proof-of-the-concept code.

Looking forward to it!

Hadley

--
Assistant Professor
Department of Statistics / Rice University
http://had.co.nz/

Vitalie Spinu

2012-08-29 16:11:35 UTC

Hadley Wickham <hadley at rice.edu>
Each code chunk should be evaluated in a new environment. Then
environment is inspected and all the new objects/classes/methods are
returned.

Indeed, this is a slight complication which is solved (is it?) by
stacking the evaluation environments. That is, each new code chunk is
evaluated in an environment whose parent is the environment from the
previous evaluation.

Another option is to make snapshots for each evaluation and compare
those after each invocation.

Post by Hadley Wickham
That approach also starts to get complicated with S4 because it's
picky about how you create the environment.

I will revisit this. As far as I remember the internals are pretty
straightforward.

There might be an environment (or list) in roxygen namespace holding all
the object guessers (roxy_env_inspectors?). Each of them is called and
should return a list of new objects detected.
The basic one just looks for normally assigned objects,
roxy_env_inspector.S3 looks for S3 tables *in* the evaluation
environment, roxy_env_inspector.S4_classes looks in class table and gets
the classes defined, roxy_env_inspector.S4_methods searches the S4
table.

I just don't see the big advantage of this over parsing the call.
This approach will be much more expensive because now you have to run
multiple tests after every single expression.

Is the bottleneck in the code evaluation? It looks to me that all of
that is insignificant as compared to the parsing and rd generation.

Post by Hadley Wickham
(Also I'm pretty sure your approach for S3 won't work, because those
S3 tables are created by namespace definitions, not by evaluating a
function)

Hmm, indeed, I have been overoptimistic. But s3 methods are just
functions, and automatic detection is not possible anyways. The user
have to declare them as S3 method anyways. So nothing is lost with
respect tot he current implementation.

If a package declares some wiredo side effect initialization (which is
pretty rare) it should define roxy_env_inspector.Wierdo_thing and add it
to roxy_env_inspectors environment (which should not be sealed in the
package).

So now you are defining your own object system ;)

Very simple one - just functions, no dispatch no objects. You will need
something similar anyhow, as there are no objects at that stage, and no
OO system can be used.

Post by Hadley Wickham
Why wouldn't should these functions just be single method classes
that inherit from RoxyDetector or similar?

I guess you mean the object_from_call.foo dispatch mechanism here.

The benefit of the RoxyDetector is that in 99.99% of the cases, the
builtin detectors will do the job. A developer which would like to have
a custom documentation for an object of class X don't need to bother at
all with object_from_call and pseudo dispatch.

Whatever the textual representation by which object X is generated X <-
new(..), or X <- X.constructor(), or createObjectX(), or X <- eval(...)
will always have the same result. It is much easier for the end user,
who is not forced to use a specific declaration for roxygen to work. All
what matters is the end object(s) which the code generate.

In current implementation, whenever a guy would like to have a different
call syntax, you will have to modify the base code.

In .01% of the wiredo cases with side effects, a developer will have to
write the detector function, which is a much simpler concept than pseudo
dispatch on the call name.

I still find the object_from_call mechanism a bit tricky, it's not only
the call name, but also the object name that plays the role. So the call
parser should be very smart, not only about how to detect the call name,
but also on how to detect the name of the object. Am I missing anything?
Where does the 'name' in the last call comes from:

????????? #20 ? roxygen3/R/object-from-call.r
? object_from_call <- function(call, env) {
? if (is.null(call)) return()
?
? # Find function, then use match.call to construct complete call
? f <- eval(call[[1]], env)
? if (!is.primitive(f)) {
? call <- match.call(eval(call[[1]], env), call)
? }
?
? fun_name <- deparse(call[[1]])
? f <- find_fun(str_c("object_from_call.", fun_name))
?
? if (is.null(f)) return(NULL)
? f(call, name, env)
? }
????????? #34

a <- setMethod("plot", "numeric", function(x, ...) {})
str(a)

chr "plot"
Which means you'd have to compare states of all the S4 class/method
tables before and after each call. That seems slow and error prone to
me.

I have had many problems trying to do this right for devtools, so I'm
not so sure it's that simple.

I have done this with ess-developer
(http://ess.r-project.org/Manual/ess.html#ESS-developer) which allows
seamless evaluation of the code directly into the package environment
and namespace instead of the .GlobalEnv. So far so good, no problems at
all.

Unless I have missed something very basic, I am pretty optimistic about
all this story:)

Vitalie

Hadley Wickham

2012-08-29 17:16:15 UTC

Post by Vitalie Spinu
Indeed, this is a slight complication which is solved (is it?) by
stacking the evaluation environments. That is, each new code chunk is
evaluated in an environment whose parent is the environment from the
previous evaluation.

Yes, I think that would work (although I'm not sure what will happen
with the S4 method tables)

Post by Vitalie Spinu
Hmm, indeed, I have been overoptimistic. But s3 methods are just
functions, and automatic detection is not possible anyways. The user
have to declare them as S3 method anyways. So nothing is lost with
respect tot he current implementation.

See https://github.com/hadley/roxygen3/blob/master/R/s3.r for my code
to determine is a function is an s3 generic or method. I think it
should be pretty robust.

Post by Vitalie Spinu
Very simple one - just functions, no dispatch no objects. You will need
something similar anyhow, as there are no objects at that stage, and no
OO system can be used.

True. I wavered back an forth on using environments vs. special named
functions.

Post by Hadley Wickham
Why wouldn't should these functions just be single method classes
that inherit from RoxyDetector or similar?

I was thinking more that instead of

roxy_env_inspector <- new.env(parent = emptyenv())
roxy_env_inspector[["S3"]] <- ...
roxy_env_inspector[["S4"]] <- ...

you'd do

setClass("RoxyInspector")
setClass("S4RoxyInspector", contains = c("RoxyInspector"))
setMethod("inspector", signature("S4RoxyInspector"), ...)
setClass("S3RoxyInspector", contains = c("RoxyInspector"))
setMethod("inspector", signature("S3RoxyInspector"), ...)

and then retrieve all the inspectors by using a S4 introspection
function to get all children of RoxyInspector.

Post by Vitalie Spinu
Whatever the textual representation by which object X is generated X <-
new(..), or X <- X.constructor(), or createObjectX(), or X <- eval(...)
will always have the same result. It is much easier for the end user,
who is not forced to use a specific declaration for roxygen to work. All
what matters is the end object(s) which the code generate.

Ok, agreed. If you can do it, I'll happy include the code :)

Post by Vitalie Spinu
I still find the object_from_call mechanism a bit tricky, it's not only
the call name, but also the object name that plays the role. So the call
parser should be very smart, not only about how to detect the call name,
but also on how to detect the name of the object. Am I missing anything?

Ooops that's a bug - name is never used - it's always computed from
the call in the individual "methods".

But I think you need to have both the value of the object and it's
name - either alone is not enough to be able to fully document the
object.

Post by Vitalie Spinu
I have done this with ess-developer
(http://ess.r-project.org/Manual/ess.html#ESS-developer) which allows
seamless evaluation of the code directly into the package environment
and namespace instead of the .GlobalEnv. So far so good, no problems at
all.

Ok, cool! We should probably share approaches - Winston contributed a
pretty complete namespace environment generator for devtools:
https://github.com/hadley/devtools/blob/master/R/namespace-env.r

Post by Vitalie Spinu
Unless I have missed something very basic, I am pretty optimistic about
all this story:)

Great - I'm very happy to be proved wrong.

Hadley

--
Assistant Professor
Department of Statistics / Rice University
http://had.co.nz/

Vitalie Spinu

2012-08-29 18:25:14 UTC

Hadley Wickham <hadley at rice.edu>

I was thinking more that instead of
roxy_env_inspector <- new.env(parent = emptyenv())
roxy_env_inspector[["S3"]] <- ...
roxy_env_inspector[["S4"]] <- ...
you'd do
setClass("RoxyInspector")
setClass("S4RoxyInspector", contains = c("RoxyInspector"))
setMethod("inspector", signature("S4RoxyInspector"), ...)
setClass("S3RoxyInspector", contains = c("RoxyInspector"))
setMethod("inspector", signature("S3RoxyInspector"), ...)
and then retrieve all the inspectors by using a S4 introspection
function to get all children of RoxyInspector.

:D

I see. Use S4 for a storage. Well, a bit to fancy for me ;).

As an alternative you can just provide a function
roxy_register_detector(fun) which would just store fun in an environment
in roxy namespace. Simple, and everyone understands what happens.