[Fwd: [RFC] RDF metadata API draft]

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Fwd: [RFC] RDF metadata API draft]

Michael Stahl-3

argh, like crossposting would work with our news server...

hope this one will reach xml-dev...


-------- Original Message --------
From: - Mon Mar 17 16:36:09 2008


[X-posted to xml-dev & bibliographic-dev;
  please send reply to interface-discuss]

hello interface-discuss!

i am currently designing an API that would be used for implementing the
ODF metadata specification (part of ODF 1.2) [1].
this spec (and, consequently, this API) allows for attaching meta data in
RDF (Resource Description Framework [2]) to ODF packages, and to ODF
content elements.
so i would be especially interested in input from people who would like to
use ODF metadata: does this API do what you need?

[1]
http://www.oasis-open.org/committees/documents.php?wg_abbrev=office-metadata
[2] http://www.w3.org/RDF/

until now, i have tried to only put in stuff that i think is really
necessary. so this API is, in some sense, rather minimalistic.
a major missing feature would be inference based on schemas, whether RDFS
or OWL, but i am not sure whether we really need that?

another missing feature would be support for transactions. that would
depend on the backend anyway, and would be optional.

also, note that this stuff does not pass idlc yet...
oh, and please refrain from telling me i need to split this up into
multiple files, i _know_ that.

for some interfaces i cannot seem to come up with a name that i like.
i believe naming things adequately is important for usability of the API.
if someone can suggest a better name, i would be happy.
note that points for which input is especially valued are conveniently
marked with FIXME :)

i've dumped the draft API on the OOo wiki:
http://wiki.services.openoffice.org/wiki/Metadata_API

regards,
michael stahl


--
"Name ist Schall und Rauch."
  -- Johann Wolfgang Goethe, "Faust: Der Tragödie erster Teil" (3457)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: [RFC] RDF metadata API draft]

Bruce D'Arcus
cc-ing ODF metadata list; re
<http://wiki.services.openoffice.org/wiki/Metadata_API>.

Michael Stahl wrote:

> i am currently designing an API that would be used for implementing the
> ODF metadata specification (part of ODF 1.2) [1].

Nice!

FWIW, I just forwarded your note to the W3C semweb list, hoping some of
the experts over their will have time to take a look.

A couple of quick questions (more than firm statements) ...


1)

On:

"RDFa statements are handled specially because they are not logically
part of any graph. (Don't blame me.)"

Am *really* not sure about this, but I wonder if it would make sense to
consider the document itself a graph? This might have other benefits,
but also would maybe allow you to not treat RDFa differently from the
perspective of the API?

That would also then fix your later "gets the names of all the graphs in
the repository. //FIXME: except "unspecified" graphs for RDFa, i guess..."

2) "executes a SPARQL "SELECT" query."

Oooh ... nice! Do I thus assume the endpoint can be arbitrary? E.g. if I
have an endpoint setup on a relational database somewhere, I can query
that from the OOo API?

If yes, that would be killer.

FWIW, I've been playing with an RDF store using ARC2, which has support
for the SPARQL update functions (which I don't think are officially part
  of the W3C spec). As you would expect, it allow me to insert and
delete triples over SPARQL.

3) On:

"/** creates a graph with the given name.

         <p>
         The name must be unique within the repository.
         The name must be a valid file name in an ODF package.
//FIXME: clarify a bit more?
         </p>"

Wouldn't it be easier and better to just require a URI for the name?

That would fix this problem too:

"    /** creates a graph with the given name.

         <p>
         The name must be unique within the repository.
         The name must be a valid file name in an ODF package.
//FIXME: clarify a bit more?
         </p>"

Just require the full URI and be done with it. I see no downside.

Bruce






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: [RFC] RDF metadata API draft]

Michael Stahl-3
In reply to this post by Michael Stahl-3

Hi Bruce,

thanks for your prompt reply!

Bruce D'Arcus wrote:

> cc-ing ODF metadata list; re
> <http://wiki.services.openoffice.org/wiki/Metadata_API>.
>
> Michael Stahl wrote:
>
>> i am currently designing an API that would be used for implementing the
>> ODF metadata specification (part of ODF 1.2) [1].
>
> Nice!
>
> FWIW, I just forwarded your note to the W3C semweb list, hoping some of
> the experts over their will have time to take a look.

good idea!

> A couple of quick questions (more than firm statements) ...
>
>
> 1)
>
> On:
>
> "RDFa statements are handled specially because they are not logically
> part of any graph. (Don't blame me.)"
>
> Am *really* not sure about this, but I wonder if it would make sense to
> consider the document itself a graph? This might have other benefits,
> but also would maybe allow you to not treat RDFa differently from the
> perspective of the API?
>
> That would also then fix your later "gets the names of all the graphs in
> the repository. //FIXME: except "unspecified" graphs for RDFa, i guess..."

hmmm... let me see... this whole RDFa business has already given me
headaches, and i am not really happy with the way it is treated
differently in the API, but this is really the best i could come up with...

In ODF, the named graphs correspond to RDF/XML files.
The RDFa statements belong to ODF content files (either content.xml or
styles.xml).
The difference between RDF/XML and content files is that you can add
arbitrary RDF statements to RDF/XML files, but you can only add a
syntactically restricted form of RDF statements, namely those that can be
expressed with RDFa as specified in the ODF metadata spec, to the content
files.
So, the XRDFNamedGraph interface would represent RDF/XML files, and has
generally applicable addStatement method.
What we could do, is to say that there are (in addition to the RDF/XML
ones) two named graphs "content.xml" and "styles.xml", and the
addStatement implementation would then enforce constraints on the
parameters for these two particular named graphs, throwing, say,
IllegalArgumentException for statements that cannot be represented in RDFa
in the ODF content files.
(Or, worse, the API could accept such statements, insert them, and then
fail to store them in the ODF file; I think we can rule out such an
"implementation approach" :) )
But I much prefer two distinct methods for these two use cases of RDF/XML
and RDFa, and I would really hate to document a method that handles both
(just look how long the documentation of addStatementRDFa is already), and
I bet users would be really confused by such a method that has wildly
different expectations of correct parameters depending on which object it
is called on.
Oh, and another thing: we need to be able to find out (in the
implementation) whether an ODF element has RDFa metadata attached
(obviously, when we store a document).
This is stored in the RDF repository, so how do you locate the RDFa
statements there that belong to a particular object representing an ODF
element?
My idea was that, since all RDFa-bearing ODF element may also have an XML
ID, we just guarantee that every instance of an ODF element that actually
has RDFa attached will have an XML ID, and then use that XML ID as the
context (i.e. graph name).
Given the ODF element, we can then just enumerate that named graph to find
all (0 or 1 or 2) RDFa statements.
That is what the "unspecified named graph" in the API is about (in case
you wondered).

 > Am *really* not sure about this, but I wonder if it would make sense to
 > consider the document itself a graph?

hmmm, interesting idea.
If I understand the RDF model correctly, then a document would have a
graph associated, which consists of a set of named graphs (the RDF/XML files).
So the whole document is a graph, and every single RDF/XML file is a
subset of the document graph, and _also_ its own named graph.
But: we cannot add a RDF statement to the "whole document graph", while
not adding it to any named graph; if we allowed a statement that is not
part of any named graph, where would we store it in the ODF file
(especially in a way that is interoperable with other implementations of
the ODF metadata spec)?
So I would claim that it does not make sense to expose in the API a "whole
  document graph"; you just add your statements to some named graph, and
they end up in the "whole document graph" as a side effect.
The drawback of this approach is that you cannot enumerate the "whole
document graph".
But you can still query it with SPARQL, and it seems to me that is good
enough.

oh, and if you think i have misunderstood something, please tell me!
i only started reading up on RDF a month ago...

> 2) "executes a SPARQL "SELECT" query."
>
> Oooh ... nice! Do I thus assume the endpoint can be arbitrary? E.g. if I
> have an endpoint setup on a relational database somewhere, I can query
> that from the OOo API?

That was the idea, yes. The XRDFRepository interface can be implemented
either by an in-memory repository (as would be done for ODF documents), or
by some database, accessed via HTTP or SQL or whatever.
However, we want to have one repository per document, and the database
repository would be distinct as well, so you would not be able to write a
SPARQL query across document metadata and a database; you would have to
write two queries.
Of course, the first iteration of the implementation will only have
in-memory repositories (for lack of time).

> If yes, that would be killer.
>
> FWIW, I've been playing with an RDF store using ARC2, which has support
> for the SPARQL update functions (which I don't think are officially part
>  of the W3C spec). As you would expect, it allow me to insert and delete
> triples over SPARQL.

SPARQL update? haven't heard of that yet...
but i guess if it is not standard yet we should not provide an API for it now.

> 3) On:
>
> "/** creates a graph with the given name.
>
>         <p>
>         The name must be unique within the repository.
>         The name must be a valid file name in an ODF package.
> //FIXME: clarify a bit more?
>         </p>"
>
> Wouldn't it be easier and better to just require a URI for the name?
>
> That would fix this problem too:
>
> "    /** creates a graph with the given name.
>
>         <p>
>         The name must be unique within the repository.
>         The name must be a valid file name in an ODF package.
> //FIXME: clarify a bit more?
>         </p>"
>
> Just require the full URI and be done with it. I see no downside.

hmm, yes, good idea.
Actually, if we want the interface that represents graphs to inherit from
XRDFResource, then graph names must be URIs, right?

So, the graph name must be an URI, and it must represent the file name
somehow.


regards,
Michael Stahl


--
"Dealing with failure is easy: Work hard to improve.
  Success is also easy to handle: You've solved the wrong problem.
  Work hard to improve." -- Alan Perlis

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: [RFC] RDF metadata API draft]

Bruce D'Arcus
On Wed, Mar 19, 2008 at 7:39 AM, Michael Stahl <[hidden email]> wrote:

...

>   > Am *really* not sure about this, but I wonder if it would make sense to
>   > consider the document itself a graph?
>
>  hmmm, interesting idea.
>  If I understand the RDF model correctly, then a document would have a
>  graph associated, which consists of a set of named graphs (the RDF/XML files).
>  So the whole document is a graph, and every single RDF/XML file is a
>  subset of the document graph, and _also_ its own named graph.

Yes, except I'm suggesting that the content files also have their own graphs.

>  But: we cannot add a RDF statement to the "whole document graph", while
>  not adding it to any named graph; if we allowed a statement that is not
>  part of any named graph, where would we store it in the ODF file
>  (especially in a way that is interoperable with other implementations of
>  the ODF metadata spec)?

Right.

>  So I would claim that it does not make sense to expose in the API a "whole
>   document graph"; you just add your statements to some named graph, and
>  they end up in the "whole document graph" as a side effect.
>  The drawback of this approach is that you cannot enumerate the "whole
>  document graph".

Except as a collection of named graphs, which is fine.

>  But you can still query it with SPARQL, and it seems to me that is good
>  enough.
>
>  oh, and if you think i have misunderstood something, please tell me!
>  i only started reading up on RDF a month ago...
>
>
>  > 2) "executes a SPARQL "SELECT" query."
>  >
>  > Oooh ... nice! Do I thus assume the endpoint can be arbitrary? E.g. if I
>  > have an endpoint setup on a relational database somewhere, I can query
>  > that from the OOo API?
>
>  That was the idea, yes. The XRDFRepository interface can be implemented
>  either by an in-memory repository (as would be done for ODF documents), or
>  by some database, accessed via HTTP or SQL or whatever.
>  However, we want to have one repository per document, and the database
>  repository would be distinct as well, so you would not be able to write a
>  SPARQL query across document metadata and a database; you would have to
>  write two queries.

OK, but I think a single SPARQL query can point to multiple repositories (?).

...

>  So, the graph name must be an URI, and it must represent the file name
>  somehow.

The URI is a name for a graph, which in the context of ODF, gets
seialized as RDf/XML. In that sense, the file name is rather
orthogonal.

Bruce

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: [RFC] RDF metadata API draft]

Michael Stahl-3
In reply to this post by Michael Stahl-3
Bruce D'Arcus wrote:

> On Wed, Mar 19, 2008 at 7:39 AM, Michael Stahl <[hidden email]> wrote:
>
> ...
>
>>   > Am *really* not sure about this, but I wonder if it would make sense to
>>   > consider the document itself a graph?
>>
>>  hmmm, interesting idea.
>>  If I understand the RDF model correctly, then a document would have a
>>  graph associated, which consists of a set of named graphs (the RDF/XML files).
>>  So the whole document is a graph, and every single RDF/XML file is a
>>  subset of the document graph, and _also_ its own named graph.
>
> Yes, except I'm suggesting that the content files also have their own graphs.

hmm... we could do this.
But there are some caveats:
For one, you cannot add arbitrary statements to a content file graph.
In fact, the RDFa methods at XRDFRepository are sufficient for
manipulating content files, and (as described in the message you quoted) i
would rather not allow using addStatement to manipulate the content graph.
Also, we want to easily identify the RDFa statements for an individual ODF
element (as described above, in those three dots).
Using the XML ID as the context seems expedient to me.
One thing I like about Sesame 2 is that it supports a list of contexts
when you add a statement.
Redland RDF, OTOH, only seems to support one context per statement.

So, the only thing that you could do with a content file graph is
enumerate it.
But you are right, that is useful functionality to have.
What about a method XRDFRepository::getRDFaStatements(Subject, Predicate,
Object)?
Or we could just have a content file graph, and its addStatements method
would just throw IAmAContentGraph_UseAddStatementRDFa_Exception :)
Or, alternatively, we split out the concept of adding statements from
XRDFNamedGraph, so that we have two interfaces, one without addStatement,
and another, inheriting from the first, that adds addStatement.

hmm, i think i prefer the first alternative.

>>  But: we cannot add a RDF statement to the "whole document graph", while
>>  not adding it to any named graph; if we allowed a statement that is not
>>  part of any named graph, where would we store it in the ODF file
>>  (especially in a way that is interoperable with other implementations of
>>  the ODF metadata spec)?
>
> Right.
>
>>  So I would claim that it does not make sense to expose in the API a "whole
>>   document graph"; you just add your statements to some named graph, and
>>  they end up in the "whole document graph" as a side effect.
>>  The drawback of this approach is that you cannot enumerate the "whole
>>  document graph".
>
> Except as a collection of named graphs, which is fine.

hmmm, actually we could have a method
XRDFRepository::getStatements(Subject, Predicate, Object)
that just enumerates the whole repository.
That should be easy to implement.
Do you think this is a useful addition?

>>  But you can still query it with SPARQL, and it seems to me that is good
>>  enough.
>>
>>  oh, and if you think i have misunderstood something, please tell me!
>>  i only started reading up on RDF a month ago...
>>
>>
>>  > 2) "executes a SPARQL "SELECT" query."
>>  >
>>  > Oooh ... nice! Do I thus assume the endpoint can be arbitrary? E.g. if I
>>  > have an endpoint setup on a relational database somewhere, I can query
>>  > that from the OOo API?
>>
>>  That was the idea, yes. The XRDFRepository interface can be implemented
>>  either by an in-memory repository (as would be done for ODF documents), or
>>  by some database, accessed via HTTP or SQL or whatever.
>>  However, we want to have one repository per document, and the database
>>  repository would be distinct as well, so you would not be able to write a
>>  SPARQL query across document metadata and a database; you would have to
>>  write two queries.
>
> OK, but I think a single SPARQL query can point to multiple repositories (?).

well, that depends very much on the underlying implementation...

Actually i thought that Sesame 2 could do this, but no: it only has a
StackableSail which allows for creating a repository by stacking basic
blocks like a store and an inferencer.

Redland RDF cannot do this, which is bad enough, because we want to be
able to implement the whole API with it.
(Unfortunately, we cannot depend on Java for core OOo functionality.)

So, in summary, you can only query one repository with one SPARQL query.

> ...
>
>>  So, the graph name must be an URI, and it must represent the file name
>>  somehow.
>
> The URI is a name for a graph, which in the context of ODF, gets
> seialized as RDf/XML. In that sense, the file name is rather
> orthogonal.

looking at the spec again i see that the manifest contains a pkg:path
property for the file name. Sorry, had forgotten that. Of course, in that
case what i wrote above does not make any sense.

Michael


--
PUBLIC NOTICE AS REQUIRED BY LAW:   Any Use of This Product, in Any
Manner Whatsoever, Will Increase the Amount of Disorder in the Universe.
Although No Liability Is Implied Herein, the Consumer Is Warned That
This Process Will Ultimately Lead to the Heat Death of the Universe.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]