html code generated from Open Office

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

html code generated from Open Office

Howard Morris (aka Col Boogie)
The reason I joined Open Office was to enhance the html code generated from writer.
 
For now, I have constructed html code to upload an html file generated by Open Office and PHP code to tweak that code and download a better version. Everything I have doe so far is in the attached zzz.zip file. Explanations are in Readme.txt
 
I would have liked to have done this directly, but when I asked how to get there, I was directed to a site where I could download all the modules one by one, and there seemed to be hundreds of them and no indication what any of them contained with dubious directions how to put them together. This is not how I like to do things, so I went the other way for now.
 
I seem to have run across documentation that Open Office puts its files internally into a XML format. If I could extract the XML directly from the .odt file I could do everything from there. Assuming that is true, is there an updated copy of https://www.openoffice.org/xml/xml_specification.pdf ? I would hope that whoever is doing the documentation keeps that up to date. It will take me at least 20 hours to read that document, but at least I will retain most of it the first time.
 
Howard Morris


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

zzz.zip (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: html code generated from Open Office

Dennis E. Hamilton
You need to download and read the OASIS ODF Specification.  Since you are interested specifically in Writer, you might as well start with ODF 1.1 (a single document).  You can get all of the bits at
http://docs.oasis-open.org/office/v1.1/OS/.
 
The ODT file is a Zip file that has multiple parts of the document as XML files within the Zip.  
 
You can see such a file by renaming it.  E.g., rename test.odt to test.odt.zip and open it as a Zip.  Then rename it back.
 
The specification for OpenOffice 1.0 XML format does not apply to any version of OpenOffice since ODF was adopted and implemented.
 
Also, the special Office Writer Web format is not singled out in the ODF specification.  What I suggest you do is make test documents and save them as HTML Document (OpenOffice Writer) and see what is produced.  Also, open such a document in Writer to see how it comes up.
 
The latest specification is ODF 1.2.  It is much larger and only Part 1 and Part 3 apply to Writer.  (Part 2 is all about spreadsheet formulas.)
 
You also might want to take a look at the new Corinthia project which is interested in document conversions using HTML as an intermediary.
 
-   Dennis
 
 
 
From: Howard Morris (aka Col Boogie) [mailto:[hidden email]]
Sent: Monday, January 5, 2015 20:36
To: [hidden email]
Subject: html code generated from Open Office
 
The reason I joined Open Office was to enhance the html code generated from writer.
 
For now, I have constructed html code to upload an html file generated by Open Office and PHP code to tweak that code and download a better version. Everything I have doe so far is in the attached zzz.zip file. Explanations are in Readme.txt
 
I would have liked to have done this directly, but when I asked how to get there, I was directed to a site where I could download all the modules one by one, and there seemed to be hundreds of them and no indication what any of them contained with dubious directions how to put them together. This is not how I like to do things, so I went the other way for now.
 
I seem to have run across documentation that Open Office puts its files internally into a XML format. If I could extract the XML directly from the .odt file I could do everything from there. Assuming that is true, is there an updated copy of https://www.openoffice.org/xml/xml_specification.pdf ? I would hope that whoever is doing the documentation keeps that up to date. It will take me at least 20 hours to read that document, but at least I will retain most of it the first time.
 
Howard Morris
Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Andrea Pescetti-2
In reply to this post by Howard Morris (aka Col Boogie)
On 06/01/2015 Howard Morris (aka Col Boogie) wrote:
> The reason I joined Open Office was to enhance the html code generated
> from writer.

Good! There a number of valid points in your ZIP file. There are also
several possible ways to produce HTML from within OpenOffice, which do
different things, so you should specify what operation you are going to
improve (I assume that you are using, in Writer, File - Save As - HTML;
but File - Export gives you a different one if you installed it in your
system).

> I would have liked to have done this directly, but when I asked how to
> get there, I was directed to a site where I could download all the
> modules one by one, and there seemed to be hundreds of them

The source code is indeed quite big and monolithic, but you could simply
hack the conversion files. See
https://archive.fosdem.org/2014/schedule/event/improving_the_xhtml_export_filter/ 
for more (note: video is from a test, it's not even me in the video; use
the slides). And please come back with any doubts. We can probably reuse
the concepts from your ZIP file, but they must be adapted.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Regina Henschel
In reply to this post by Howard Morris (aka Col Boogie)
Hi Howard,

Howard Morris (aka Col Boogie) schrieb:
> The reason I joined Open Office was to enhance the html code generated
> from writer.

Writer/Web is currently not maintained.

> For now, I have constructed html code to upload an html file generated
> by Open Office

OpenOffice generates bad HTML using Writer/Web.

  and PHP code to tweak that code and download a better
> version. Everything I have doe so far is in the attached zzz.zip file.
> Explanations are in Readme.txt

I would not go that way.

> I would have liked to have done this directly, but when I asked how to
> get there, I was directed to a site where I could download all the
> modules one by one, and there seemed to be hundreds of them and no
> indication what any of them contained with dubious directions how to put
> them together. This is not how I like to do things, so I went the other
> way for now.

If you will improve the Writer/Web module, you need indeed work directly
on the code. But if you do not like C++ coding and the effort of
building your own OpenOffice, there are alternatives.

OpenOffice has the ability to execute XSLT. Please open a text document
and then try File > Export > type XHTML. You get a nice XHTML document.
The shortcoming is, that it currently only supports simple structures,
and that it is only for export, and import is missing.

Goto Tools > XML Filter Settings. That is the manager for XSLT filters.

Find the XHTML XSLT files themselves in folder
program/share/xslt/export/xhtml.

Improving this XHTML filter might fit better to your interests.


> I seem to have run across documentation that Open Office puts its files
> internally into a XML format. If I could extract the XML directly from
> the .odt file I could do everything from there. Assuming that is true,
> is there an updated copy of
> https://www.openoffice.org/xml/xml_specification.pdf ?

There will never be an "update" and it is not relevant for your purpose.
It is the format, which belongs to the sxw (sxc, sxm, ...) files. That
is the format OpenOffice1.1 had used, before ODF exists.

OpenOffice uses ODF1.2 now, you have got some details already. Here the
link to the specs https://www.oasis-open.org/standards#opendocumentv1.2

  I would hope that
> whoever is doing the documentation keeps that up to date. It will take
> me at least 20 hours to read that document, but at least I will retain
> most of it the first time.

Other thoughts: What about using "flat ODF1.2" (no container, but all in
one file) directly and provide and add style sheets for the browsers? Or
look, what the project http://webodf.org does, or other similar projects.

Kind regards
Regina






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Andrea Pescetti-2
On 06/01/2015 Regina Henschel wrote:

> Howard Morris (aka Col Boogie) schrieb:
>> The reason I joined Open Office was to enhance the html code generated
>> from writer. ...
> OpenOffice has the ability to execute XSLT. Please open a text document
> and then try File > Export > type XHTML. You get a nice XHTML document.
> The shortcoming is, that it currently only supports simple structures,
> and that it is only for export, and import is missing.
>
> Goto Tools > XML Filter Settings. That is the manager for XSLT filters.
>
> Find the XHTML XSLT files themselves in folder
> program/share/xslt/export/xhtml.
>
> Improving this XHTML filter might fit better to your interests.

Regina wrote it much better, but this is what I meant when I wrote that
you could "hack the conversion files". See my FOSDEM slides for more
(link in my other message) and if you go this way I can probably give
you some further hints.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Howard Morris (aka Col Boogie)
In reply to this post by Dennis E. Hamilton
I’d like to thank everyone for their suggestions.
I think I will try Dennis Hamilton’s suggestions first.
I will continue my conversion approach for a while just to get a better understanding of what is going on.

When I get into the Open Office code, I will also try to look at the copy paste from a browser screen problem.
I have done it myself a number of times and have seen more problems than reported.

Open Office has trouble when I input an html file I have written, not unexpected.
I think JavaScript is especially worrisome, maybe we could convert them to macros.
I expect CSS is also a problem.

I can adapt to most any computer language, but I will avoid APL (it’s code uses a lot of the Greek alphabet).

If I ever fix this task, there is a copy paste from Calc to Writer I’d like to address.
Currently, the copy paste ends up as an image in Writer, I’d like to see it changed to a table.
The functions (macro anyone?) would get lost, but they are lost anyway now.

I also remember seeing emails about trouble with Open Office Math.
From using it, it looks like LaTex code.
Editing Open Office Writer in Word and saving it in .odt format, the LaTex code gets changed.
It would be nice if we added an insert a “Greek letter” as a Math options panel.
If anyone wants to redo it, I know how to write math expressions using native html.
Basically I use an (inline) table to hold all the symbols.
I do realize that currently Open Office cannot have a table in the middle of a paragraph.
However, I think we can sneak  in a construct.

Howard
Reply | Threaded
Open this post in threaded view
|

RE: html code generated from Open Office

Dennis E. Hamilton
Just a short note about tables in paragraphs, ...

 -- replying below to --
From: Howard Morris (aka Col Boogie) [mailto:[hidden email]]
Sent: Tuesday, January 6, 2015 11:41
To: [hidden email]; [hidden email]
Subject: Re: html code generated from Open Office

[ ... ]

I also remember seeing emails about trouble with Open Office Math.
From using it, it looks like LaTex code.
 
<orcmid>
   Well, it looks a little like TeX, but it isn't TeX.  
   It goes back to how StarMath was conceived.  Regina
   can provide much more about that.
</orcmid>
Editing Open Office Writer in Word and saving it in .odt format, the LaTex code gets changed.
It would be nice if we added an insert a “Greek letter” as a Math options panel.
If anyone wants to redo it, I know how to write math expressions using native html.
Basically I use an (inline) table to hold all the symbols.
I do realize that currently Open Office cannot have a table in the middle of a paragraph.
However, I think we can sneak  in a construct.

<orcmid>
  In the ODF Schema, table elements are text-content, not paragraph-content.  
  There are ways to introduce text content in certain paragraph-content elements,
  such as with text frames.  If you want to roam around in the schema, this
  might be fun for you: <http://nfoworks.org/notes/2014/05/n140504f.htm>.

  However, I think using MathML and full Unicode for symbols may be better in
  that particular case.
</orcmid>

Howard


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Howard Morris (aka Col Boogie)
Thank you again,
    Brief comment. I know about MathML. Last time I tried it I ran into a browser that didn’t support it. Hmm, let the user get a better browser, they are cheap enough. In this case no need for full Unicode %beta becomes &beta; in html. My comment on Greek letters is that the prompts do not tell about Greek letters. I had to use Google to find out about %beta in open office. I suspect there are those who will not be able to figure it out.

Howard
Reply | Threaded
Open this post in threaded view
|

RE: html code generated from Open Office

Howard Morris (aka Col Boogie)
In reply to this post by Howard Morris (aka Col Boogie)
Sorry, I seem too have deleted a more recent version of this request.
Someone sent me name of module that gets invoked for a save as HTML request.
I also like to compare that with the module that does the print request and the module that does the print preview. What are all those module names.
Where can I download those modules from?

Thanks Howard

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________
From: Howard Morris (aka Col Boogie) <[hidden email]>
Sent: Monday, January 5, 2015 10:36:20 PM
To: [hidden email]
Subject: html code generated from Open Office

The reason I joined Open Office was to enhance the html code generated from writer.

For now, I have constructed html code to upload an html file generated by Open Office and PHP code to tweak that code and download a better version. Everything I have doe so far is in the attached zzz.zip file. Explanations are in Readme.txt

I would have liked to have done this directly, but when I asked how to get there, I was directed to a site where I could download all the modules one by one, and there seemed to be hundreds of them and no indication what any of them contained with dubious directions how to put them together. This is not how I like to do things, so I went the other way for now.

I seem to have run across documentation that Open Office puts its files internally into a XML format. If I could extract the XML directly from the .odt file I could do everything from there. Assuming that is true, is there an updated copy of https://www.openoffice.org/xml/xml_specification.pdf ? I would hope that whoever is doing the documentation keeps that up to date. It will take me at least 20 hours to read that document, but at least I will retain most of it the first time.

Howard Morris
Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Damjan Jovanovic
Hi

If it was me, I said:
The code for the "Save As" -> "HTML document" feature seems to be in:

main/sc/source/filter/html
for Calc, and
main/sw/source/filter/html
for Writer.
(Not sure if there are more?)

Thank you for your contribution, and please let us know if you need any
further help.
Damjan

On Tue, Sep 4, 2018 at 2:55 AM Howard Cary Morris <
[hidden email]> wrote:

> Sorry, I seem too have deleted a more recent version of this request.
> Someone sent me name of module that gets invoked for a save as HTML
> request.
> I also like to compare that with the module that does the print request
> and the module that does the print preview. What are all those module names.
> Where can I download those modules from?
>
> Thanks Howard
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
> ________________________________
> From: Howard Morris (aka Col Boogie) <[hidden email]>
> Sent: Monday, January 5, 2015 10:36:20 PM
> To: [hidden email]
> Subject: html code generated from Open Office
>
> The reason I joined Open Office was to enhance the html code generated
> from writer.
>
> For now, I have constructed html code to upload an html file generated by
> Open Office and PHP code to tweak that code and download a better version.
> Everything I have doe so far is in the attached zzz.zip file. Explanations
> are in Readme.txt
>
> I would have liked to have done this directly, but when I asked how to get
> there, I was directed to a site where I could download all the modules one
> by one, and there seemed to be hundreds of them and no indication what any
> of them contained with dubious directions how to put them together. This is
> not how I like to do things, so I went the other way for now.
>
> I seem to have run across documentation that Open Office puts its files
> internally into a XML format. If I could extract the XML directly from the
> .odt file I could do everything from there. Assuming that is true, is there
> an updated copy of https://www.openoffice.org/xml/xml_specification.pdf ?
> I would hope that whoever is doing the documentation keeps that up to date.
> It will take me at least 20 hours to read that document, but at least I
> will retain most of it the first time.
>
> Howard Morris
>
Reply | Threaded
Open this post in threaded view
|

RE: html code generated from Open Office

Howard Morris (aka Col Boogie)
OK, where can I get a copy of that module for Writer?
Would also like a copy of module that does print and one that does view preprint.
(Mainly for comparison purposes.)
If there is documentation of the format of the .odt files, I might be able to write the module from scratch.

Howard

From: Damjan Jovanovic<mailto:[hidden email]>
Sent: Monday, September 3, 2018 11:20 PM
To: Apache OO<mailto:[hidden email]>; [hidden email]<mailto:[hidden email]>
Subject: Re: html code generated from Open Office

Hi

If it was me, I said:
The code for the "Save As" -> "HTML document" feature seems to be in:

main/sc/source/filter/html
for Calc, and
main/sw/source/filter/html
for Writer.
(Not sure if there are more?)

Thank you for your contribution, and please let us know if you need any further help.
Damjan

On Tue, Sep 4, 2018 at 2:55 AM Howard Cary Morris <[hidden email]<mailto:[hidden email]>> wrote:
Sorry, I seem too have deleted a more recent version of this request.
Someone sent me name of module that gets invoked for a save as HTML request.
I also like to compare that with the module that does the print request and the module that does the print preview. What are all those module names.
Where can I download those modules from?

Thanks Howard

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgo.microsoft.com%2Ffwlink%2F%3FLinkId%3D550986&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=%2FM2%2FbybEdfduyk8j%2BkyvRNw26xajYIEYFY9RYC7h6As%3D&reserved=0>> for Windows 10

________________________________
From: Howard Morris (aka Col Boogie) <[hidden email]<mailto:[hidden email]>>
Sent: Monday, January 5, 2015 10:36:20 PM
To: [hidden email]<mailto:[hidden email]>
Subject: html code generated from Open Office

The reason I joined Open Office was to enhance the html code generated from writer.

For now, I have constructed html code to upload an html file generated by Open Office and PHP code to tweak that code and download a better version. Everything I have doe so far is in the attached zzz.zip file. Explanations are in Readme.txt

I would have liked to have done this directly, but when I asked how to get there, I was directed to a site where I could download all the modules one by one, and there seemed to be hundreds of them and no indication what any of them contained with dubious directions how to put them together. This is not how I like to do things, so I went the other way for now.

I seem to have run across documentation that Open Office puts its files internally into a XML format. If I could extract the XML directly from the .odt file I could do everything from there. Assuming that is true, is there an updated copy of https://www.openoffice.org/xml/xml_specification.pdf<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.openoffice.org%2Fxml%2Fxml_specification.pdf&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=On9HJQQ6lngCY9krxQCHwEa54cHETuTfiJOBwBnew%2BE%3D&reserved=0> ? I would hope that whoever is doing the documentation keeps that up to date. It will take me at least 20 hours to read that document, but at least I will retain most of it the first time.

Howard Morris

Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Damjan Jovanovic
Those modules are in our source code, see this link about getting it:
https://openoffice.apache.org/source.html
Building it is quite involved :-/.

What are you looking for in terms of printing?

ODT files are documented but it's a lot of documentation:
https://www.oasis-open.org/standards#opendocumentv1.2

Damjan


On Wed, Sep 12, 2018 at 10:00 PM Howard Cary Morris <
[hidden email]> wrote:

> OK, where can I get a copy of that module for Writer?
>
> Would also like a copy of module that does print and one that does view
> preprint.
>
> (Mainly for comparison purposes.)
>
> If there is documentation of the format of the .odt files, I might be able
> to write the module from scratch.
>
>
>
> Howard
>
>
>
> *From: *Damjan Jovanovic <[hidden email]>
> *Sent: *Monday, September 3, 2018 11:20 PM
> *To: *Apache OO <[hidden email]>;
> [hidden email]
> *Subject: *Re: html code generated from Open Office
>
>
>
> Hi
>
>
>
> If it was me, I said:
>
> The code for the "Save As" -> "HTML document" feature seems to be in:
>
>
>
> main/sc/source/filter/html
>
> for Calc, and
>
> main/sw/source/filter/html
>
> for Writer.
>
> (Not sure if there are more?)
>
>
>
> Thank you for your contribution, and please let us know if you need any
> further help.
>
> Damjan
>
>
>
> On Tue, Sep 4, 2018 at 2:55 AM Howard Cary Morris <
> [hidden email]> wrote:
>
> Sorry, I seem too have deleted a more recent version of this request.
> Someone sent me name of module that gets invoked for a save as HTML
> request.
> I also like to compare that with the module that does the print request
> and the module that does the print preview. What are all those module names.
> Where can I download those modules from?
>
> Thanks Howard
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgo.microsoft.com%2Ffwlink%2F%3FLinkId%3D550986&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=%2FM2%2FbybEdfduyk8j%2BkyvRNw26xajYIEYFY9RYC7h6As%3D&reserved=0>>
> for Windows 10
>
> ________________________________
> From: Howard Morris (aka Col Boogie) <[hidden email]>
> Sent: Monday, January 5, 2015 10:36:20 PM
> To: [hidden email]
> Subject: html code generated from Open Office
>
> The reason I joined Open Office was to enhance the html code generated
> from writer.
>
> For now, I have constructed html code to upload an html file generated by
> Open Office and PHP code to tweak that code and download a better version.
> Everything I have doe so far is in the attached zzz.zip file. Explanations
> are in Readme.txt
>
> I would have liked to have done this directly, but when I asked how to get
> there, I was directed to a site where I could download all the modules one
> by one, and there seemed to be hundreds of them and no indication what any
> of them contained with dubious directions how to put them together. This is
> not how I like to do things, so I went the other way for now.
>
> I seem to have run across documentation that Open Office puts its files
> internally into a XML format. If I could extract the XML directly from the
> .odt file I could do everything from there. Assuming that is true, is there
> an updated copy of https://www.openoffice.org/xml/xml_specification.pdf
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.openoffice.org%2Fxml%2Fxml_specification.pdf&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=On9HJQQ6lngCY9krxQCHwEa54cHETuTfiJOBwBnew%2BE%3D&reserved=0>
> ? I would hope that whoever is doing the documentation keeps that up to
> date. It will take me at least 20 hours to read that document, but at least
> I will retain most of it the first time.
>
> Howard Morris
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: html code generated from Open Office

Howard Morris (aka Col Boogie)
Only reason I want to see Print and print preview code is that I want the HTML5 look identical to printed code. I will have additional references to understand the code.



Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10



________________________________
From: Damjan Jovanovic <[hidden email]>
Sent: Wednesday, September 12, 2018 7:59:53 PM
To: Howard Cary Morris
Cc: Apache OO
Subject: Re: html code generated from Open Office

Those modules are in our source code, see this link about getting it:
https://openoffice.apache.org/source.html
Building it is quite involved :-/.

What are you looking for in terms of printing?

ODT files are documented but it's a lot of documentation:
https://www.oasis-open.org/standards#opendocumentv1.2

Damjan


On Wed, Sep 12, 2018 at 10:00 PM Howard Cary Morris <
[hidden email]> wrote:

> OK, where can I get a copy of that module for Writer?
>
> Would also like a copy of module that does print and one that does view
> preprint.
>
> (Mainly for comparison purposes.)
>
> If there is documentation of the format of the .odt files, I might be able
> to write the module from scratch.
>
>
>
> Howard
>
>
>
> *From: *Damjan Jovanovic <[hidden email]>
> *Sent: *Monday, September 3, 2018 11:20 PM
> *To: *Apache OO <[hidden email]>;
> [hidden email]
> *Subject: *Re: html code generated from Open Office
>
>
>
> Hi
>
>
>
> If it was me, I said:
>
> The code for the "Save As" -> "HTML document" feature seems to be in:
>
>
>
> main/sc/source/filter/html
>
> for Calc, and
>
> main/sw/source/filter/html
>
> for Writer.
>
> (Not sure if there are more?)
>
>
>
> Thank you for your contribution, and please let us know if you need any
> further help.
>
> Damjan
>
>
>
> On Tue, Sep 4, 2018 at 2:55 AM Howard Cary Morris <
> [hidden email]> wrote:
>
> Sorry, I seem too have deleted a more recent version of this request.
> Someone sent me name of module that gets invoked for a save as HTML
> request.
> I also like to compare that with the module that does the print request
> and the module that does the print preview. What are all those module names.
> Where can I download those modules from?
>
> Thanks Howard
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgo.microsoft.com%2Ffwlink%2F%3FLinkId%3D550986&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=%2FM2%2FbybEdfduyk8j%2BkyvRNw26xajYIEYFY9RYC7h6As%3D&reserved=0>>
> for Windows 10
>
> ________________________________
> From: Howard Morris (aka Col Boogie) <[hidden email]>
> Sent: Monday, January 5, 2015 10:36:20 PM
> To: [hidden email]
> Subject: html code generated from Open Office
>
> The reason I joined Open Office was to enhance the html code generated
> from writer.
>
> For now, I have constructed html code to upload an html file generated by
> Open Office and PHP code to tweak that code and download a better version.
> Everything I have doe so far is in the attached zzz.zip file. Explanations
> are in Readme.txt
>
> I would have liked to have done this directly, but when I asked how to get
> there, I was directed to a site where I could download all the modules one
> by one, and there seemed to be hundreds of them and no indication what any
> of them contained with dubious directions how to put them together. This is
> not how I like to do things, so I went the other way for now.
>
> I seem to have run across documentation that Open Office puts its files
> internally into a XML format. If I could extract the XML directly from the
> .odt file I could do everything from there. Assuming that is true, is there
> an updated copy of https://www.openoffice.org/xml/xml_specification.pdf
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.openoffice.org%2Fxml%2Fxml_specification.pdf&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=On9HJQQ6lngCY9krxQCHwEa54cHETuTfiJOBwBnew%2BE%3D&reserved=0>
> ? I would hope that whoever is doing the documentation keeps that up to
> date. It will take me at least 20 hours to read that document, but at least
> I will retain most of it the first time.
>
> Howard Morris
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: html code generated from Open Office

Howard Morris (aka Col Boogie)

Looked at code, tried to anyway. Way to many includes, which I don’t know how to look up.

I’d be better off with compiled code for modules with all includes expanded.

 

Included read me text of my server code to convert so far.

 

While I am thinking of it, saw an earlier email about source of help.

For online help, create with Writer and export as .xhtml file headers and footers not exported.

 

Tried dumping .odt file. I could not see text or fonts (must be encoded). But other parts were not encoded (like meta data). Seemed to waste a lot of space. Saw a lot of other stuff that made it seem like our files are being fed into a another language (xml?)

 

May consider combining .html and .xhtml files for better creation.

 

Howard

 

From: [hidden email]
Sent: Thursday, September 13, 2018 1:03 AM
To: [hidden email]
Cc: [hidden email]
Subject: RE: html code generated from Open Office

 

Only reason I want to see Print and print preview code is that I want the HTML5 look identical to printed code. I will have additional references to understand the code.



Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10



________________________________
From: Damjan Jovanovic <[hidden email]>
Sent: Wednesday, September 12, 2018 7:59:53 PM
To: Howard Cary Morris
Cc: Apache OO
Subject: Re: html code generated from Open Office

Those modules are in our source code, see this link about getting it:
https://openoffice.apache.org/source.html
Building it is quite involved :-/.

What are you looking for in terms of printing?

ODT files are documented but it's a lot of documentation:
https://www.oasis-open.org/standards#opendocumentv1.2

Damjan


On Wed, Sep 12, 2018 at 10:00 PM Howard Cary Morris <
[hidden email]> wrote:

> OK, where can I get a copy of that module for Writer?
>
> Would also like a copy of module that does print and one that does view
> preprint.
>
> (Mainly for comparison purposes.)
>
> If there is documentation of the format of the .odt files, I might be able
> to write the module from scratch.
>
>
>
> Howard
>
>
>
> *From: *Damjan Jovanovic <[hidden email]>
> *Sent: *Monday, September 3, 2018 11:20 PM
> *To: *Apache OO <[hidden email]>;
> [hidden email]
> *Subject: *Re: html code generated from Open Office
>
>
>
> Hi
>
>
>
> If it was me, I said:
>
> The code for the "Save As" -> "HTML document" feature seems to be in:
>
>
>
> main/sc/source/filter/html
>
> for Calc, and
>
> main/sw/source/filter/html
>
> for Writer.
>
> (Not sure if there are more?)
>
>
>
> Thank you for your contribution, and please let us know if you need any
> further help.
>
> Damjan
>
>
>
> On Tue, Sep 4, 2018 at 2:55 AM Howard Cary Morris <
> [hidden email]> wrote:
>
> Sorry, I seem too have deleted a more recent version of this request.
> Someone sent me name of module that gets invoked for a save as HTML
> request.
> I also like to compare that with the module that does the print request
> and the module that does the print preview. What are all those module names.
> Where can I download those modules from?
>
> Thanks Howard
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgo.microsoft.com%2Ffwlink%2F%3FLinkId%3D550986&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=%2FM2%2FbybEdfduyk8j%2BkyvRNw26xajYIEYFY9RYC7h6As%3D&reserved=0>>
> for Windows 10
>
> ________________________________
> From: Howard Morris (aka Col Boogie) <[hidden email]>
> Sent: Monday, January 5, 2015 10:36:20 PM
> To: [hidden email]
> Subject: html code generated from Open Office
>
> The reason I joined Open Office was to enhance the html code generated
> from writer.
>
> For now, I have constructed html code to upload an html file generated by
> Open Office and PHP code to tweak that code and download a better version.
> Everything I have doe so far is in the attached zzz.zip file. Explanations
> are in Readme.txt
>
> I would have liked to have done this directly, but when I asked how to get
> there, I was directed to a site where I could download all the modules one
> by one, and there seemed to be hundreds of them and no indication what any
> of them contained with dubious directions how to put them together. This is
> not how I like to do things, so I went the other way for now.
>
> I seem to have run across documentation that Open Office puts its files
> internally into a XML format. If I could extract the XML directly from the
> .odt file I could do everything from there. Assuming that is true, is there
> an updated copy of https://www.openoffice.org/xml/xml_specification.pdf
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.openoffice.org%2Fxml%2Fxml_specification.pdf&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=On9HJQQ6lngCY9krxQCHwEa54cHETuTfiJOBwBnew%2BE%3D&reserved=0>
> ? I would hope that whoever is doing the documentation keeps that up to
> date. It will take me at least 20 hours to read that document, but at least
> I will retain most of it the first time.
>
> Howard Morris
>
>
>

 



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

ReadMe.txt (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Peter Kovacs-3
I took a quick glance at the html filter code in writer. Maybe another
thing why the code is hard to read is that most comments are in german.
We do translations on demand. If you are interested in a specific file
then best practice is to write a comment in
https://bz.apache.org/ooo/show_bug.cgi?id=39199.

Some examples how to read includes:

#include <vcl/svapp.hxx> Check the vcl module -> main/vcl/inc/vcl/svapp.hxx

#include <sfx2/docfile.hxx> check sfx2 module ->
main/sfx2/inc/sfx2/docfile.hxx

#include <com/sun/star/form/XForm.hpp> Okay this is special, since it
refers to Javacode. But I do not know how to find the corresponding Java
class file.

Sadly we are still restoring OpenGrok, so no help there atm. I am
searching through Eclipse but that takes ages. Maybe someone has an Idea?

HTH

Peter


On 10/1/18 10:07 PM, Howard Cary Morris wrote:

>
> Looked at code, tried to anyway. Way to many includes, which I don’t
> know how to look up.
>
> I’d be better off with compiled code for modules with all includes
> expanded.
>
> Included read me text of my server code to convert so far.
>
> While I am thinking of it, saw an earlier email about source of help.
>
> For online help, create with Writer and export as .xhtml file headers
> and footers not exported.
>
> Tried dumping .odt file. I could not see text or fonts (must be
> encoded). But other parts were not encoded (like meta data). Seemed to
> waste a lot of space. Saw a lot of other stuff that made it seem like
> our files are being fed into a another language (xml?)
>
> May consider combining .html and .xhtml files for better creation.
>
> Howard
>
> *From: *Howard Cary Morris <mailto:[hidden email]>
> *Sent: *Thursday, September 13, 2018 1:03 AM
> *To: *[hidden email] <mailto:[hidden email]>
> *Cc: *Apache OO <mailto:[hidden email]>
> *Subject: *RE: html code generated from Open Office
>
> Only reason I want to see Print and print preview code is that I want
> the HTML5 look identical to printed code. I will have additional
> references to understand the code.
>
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> ________________________________
> From: Damjan Jovanovic <[hidden email]>
> Sent: Wednesday, September 12, 2018 7:59:53 PM
> To: Howard Cary Morris
> Cc: Apache OO
> Subject: Re: html code generated from Open Office
>
> Those modules are in our source code, see this link about getting it:
> https://openoffice.apache.org/source.html
> Building it is quite involved :-/.
>
> What are you looking for in terms of printing?
>
> ODT files are documented but it's a lot of documentation:
> https://www.oasis-open.org/standards#opendocumentv1.2
>
> Damjan
>
>
> On Wed, Sep 12, 2018 at 10:00 PM Howard Cary Morris <
> [hidden email]> wrote:
>
> > OK, where can I get a copy of that module for Writer?
> >
> > Would also like a copy of module that does print and one that does view
> > preprint.
> >
> > (Mainly for comparison purposes.)
> >
> > If there is documentation of the format of the .odt files, I might
> be able
> > to write the module from scratch.
> >
> >
> >
> > Howard
> >
> >
> >
> > *From: *Damjan Jovanovic <[hidden email]>
> > *Sent: *Monday, September 3, 2018 11:20 PM
> > *To: *Apache OO <[hidden email]>;
> > [hidden email]
> > *Subject: *Re: html code generated from Open Office
> >
> >
> >
> > Hi
> >
> >
> >
> > If it was me, I said:
> >
> > The code for the "Save As" -> "HTML document" feature seems to be in:
> >
> >
> >
> > main/sc/source/filter/html
> >
> > for Calc, and
> >
> > main/sw/source/filter/html
> >
> > for Writer.
> >
> > (Not sure if there are more?)
> >
> >
> >
> > Thank you for your contribution, and please let us know if you need any
> > further help.
> >
> > Damjan
> >
> >
> >
> > On Tue, Sep 4, 2018 at 2:55 AM Howard Cary Morris <
> > [hidden email]> wrote:
> >
> > Sorry, I seem too have deleted a more recent version of this request.
> > Someone sent me name of module that gets invoked for a save as HTML
> > request.
> > I also like to compare that with the module that does the print request
> > and the module that does the print preview. What are all those
> module names.
> > Where can I download those modules from?
> >
> > Thanks Howard
> >
> > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986
> >
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgo.microsoft.com%2Ffwlink%2F%3FLinkId%3D550986&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=%2FM2%2FbybEdfduyk8j%2BkyvRNw26xajYIEYFY9RYC7h6As%3D&reserved=0>>
> > for Windows 10
> >
> > ________________________________
> > From: Howard Morris (aka Col Boogie) <[hidden email]>
> > Sent: Monday, January 5, 2015 10:36:20 PM
> > To: [hidden email]
> > Subject: html code generated from Open Office
> >
> > The reason I joined Open Office was to enhance the html code generated
> > from writer.
> >
> > For now, I have constructed html code to upload an html file
> generated by
> > Open Office and PHP code to tweak that code and download a better
> version.
> > Everything I have doe so far is in the attached zzz.zip file.
> Explanations
> > are in Readme.txt
> >
> > I would have liked to have done this directly, but when I asked how
> to get
> > there, I was directed to a site where I could download all the
> modules one
> > by one, and there seemed to be hundreds of them and no indication
> what any
> > of them contained with dubious directions how to put them together.
> This is
> > not how I like to do things, so I went the other way for now.
> >
> > I seem to have run across documentation that Open Office puts its files
> > internally into a XML format. If I could extract the XML directly
> from the
> > .odt file I could do everything from there. Assuming that is true,
> is there
> > an updated copy of
> https://www.openoffice.org/xml/xml_specification.pdf 
> <https://www.openoffice.org/xml/xml_specification.pdf>
> >
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.openoffice.org%2Fxml%2Fxml_specification.pdf&data=02%7C01%7C%7C482eb29f850342906b7608d6121db243%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716316081442734&sdata=On9HJQQ6lngCY9krxQCHwEa54cHETuTfiJOBwBnew%2BE%3D&reserved=0>
> > ? I would hope that whoever is doing the documentation keeps that up to
> > date. It will take me at least 20 hours to read that document, but
> at least
> > I will retain most of it the first time.
> >
> > Howard Morris
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Damjan Jovanovic
On Tue, Oct 2, 2018 at 1:27 AM Peter Kovacs <[hidden email]> wrote:

> #include <com/sun/star/form/XForm.hpp> Okay this is special, since it
> refers to Javacode. But I do not know how to find the corresponding Java
> class file.
>

That isn't Java, that is a header file generated from the UNO IDL in
main/offapi/com/sun/star/form/XForm.idl
Reply | Threaded
Open this post in threaded view
|

Re: html code generated from Open Office

Andrea Pescetti-2
In reply to this post by Howard Morris (aka Col Boogie)
Howard Cary Morris wrote:
> I want the HTML5 look identical to printed code. I will have additional references to understand the code.

I see many mixed ideas in this conversations. Let me give you some
pointers, and sorry for being late at this.

Start here:
https://archive.fosdem.org/2014/schedule/event/improving_the_xhtml_export_filter/
The slides you find there will give you all pointers (source code
modules, issues, patches, history) for the XHTML export filter and the
idea to repurpose it as an HTML5 export filter. The presentation is old
(and looks very old indeed!) but it's still accurate: we didn't change
that export in recent years.

As someone already told you, we have two filters, the HTML one and the
XHTML one. They are in different code modules.

The work has to be done in the source code, so whatever you have done in
PHP and HTML (?) will have to be rewritten. But I (and many others) will
be able to read your current work, assuming you are post-processing the
HTML or XHTML output, and we can give feedback if you make it available
somewhere.

There is a fundamental error in the idea of print fidelity: HTML, and
especially HTML5, are not designed with print fidelity in mind. I mean,
the idea to have the printed HTML5 identical to the OpenOffice (say) PDF
export is unfeasible since HTML rendering is done by the user-agent
(browser) and this is by design subject to what the browser decides to
do. If you constrain the browser too much by enforcing specific CSS, all
advantages of an HTML export will be gone. So the idea should be to have
a proper HTML5 export as a start, ignoring the printed output for the
time being. Priority should be on getting the semantic level (tags)
right, and some basic CSS transformations to get the styles right. Our
export is currently using bad HTML style, but the XHTML one is a bit
better than the HTML one.

For print fidelity (but this comes much later)
https://www.w3.org/TR/css3-page/ would be the place to start. It is
wonderful, but support from tools is still quite incomplete. And anyway
implementation will need the ground work above to be completed beforehand.

The way is long, but we are here to help, even though we are all
volunteers and are often less responsive than we would like to.

The first step is building OpenOffice on your system. There is no other
way, unfortunately. Does
https://wiki.openoffice.org/wiki/Documentation/Building_Guide_AOO make
any sense to you? If you are lost, we may be able to help if you
describe your system configuration. Linux is probably the easiest
platform for building.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: html code generated from Open Office

Howard Morris (aka Col Boogie)
I created a new project in Source Forge, https://sourceforge.net/projects/open-office-html-4-to-html5/. It contains my latest tested conversion. If you cannot read the files, I’ll add more files to make it easier. Currently my latest in testing version is in http://www.americasfreedompressalliance.us/Howard/Open/ .

To see the source code there use http://www.americasfreedompressalliance.us/Howard/

and on bottom of page may enter

                Open/index.html

                Open/Gen2.php

                Open/ReadMe.txt

After looking at text, may use browser ‘Save Page as’ to get a copy of source.

I could put a shorter version of those programs in Source Forge project.



I have looked at code generated by both filters. The .html filter tries to add the page header and page footer to the generated code. Does a bad job of it, I have cleaned it up some. The .xhtml  filter doesn’t even try. The .xhtml code is true to the page width in Open Office document. The .html filter doesn’t even try. One reason I am trying to make it as print compatible as possible is to make an alternative to PDF. The output is much smaller. In fact, if you compress the output and add the image files, the result is much smaller than the .odt file. Also, some users may want the output to look like a (typed) document.



The ReadMe file tells more of what has been done than what needs to be done.

If I ever get to the table of contents, it will be really amazing.



I was able to get to the source of the .html filter. However with so many includes it was impossible to wade through. As asked before, compiled versions with all the includes expanded would help a lot. There seems to be 4 programs in the filter.



Howard





Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10



________________________________
From: Andrea Pescetti <[hidden email]>
Sent: Saturday, October 13, 2018 3:24:52 PM
To: [hidden email]
Subject: Re: html code generated from Open Office

Howard Cary Morris wrote:
> I want the HTML5 look identical to printed code. I will have additional references to understand the code.

I see many mixed ideas in this conversations. Let me give you some
pointers, and sorry for being late at this.

Start here:
https://archive.fosdem.org/2014/schedule/event/improving_the_xhtml_export_filter/
The slides you find there will give you all pointers (source code
modules, issues, patches, history) for the XHTML export filter and the
idea to repurpose it as an HTML5 export filter. The presentation is old
(and looks very old indeed!) but it's still accurate: we didn't change
that export in recent years.

As someone already told you, we have two filters, the HTML one and the
XHTML one. They are in different code modules.

The work has to be done in the source code, so whatever you have done in
PHP and HTML (?) will have to be rewritten. But I (and many others) will
be able to read your current work, assuming you are post-processing the
HTML or XHTML output, and we can give feedback if you make it available
somewhere.

There is a fundamental error in the idea of print fidelity: HTML, and
especially HTML5, are not designed with print fidelity in mind. I mean,
the idea to have the printed HTML5 identical to the OpenOffice (say) PDF
export is unfeasible since HTML rendering is done by the user-agent
(browser) and this is by design subject to what the browser decides to
do. If you constrain the browser too much by enforcing specific CSS, all
advantages of an HTML export will be gone. So the idea should be to have
a proper HTML5 export as a start, ignoring the printed output for the
time being. Priority should be on getting the semantic level (tags)
right, and some basic CSS transformations to get the styles right. Our
export is currently using bad HTML style, but the XHTML one is a bit
better than the HTML one.

For print fidelity (but this comes much later)
https://www.w3.org/TR/css3-page/ would be the place to start. It is
wonderful, but support from tools is still quite incomplete. And anyway
implementation will need the ground work above to be completed beforehand.

The way is long, but we are here to help, even though we are all
volunteers and are often less responsive than we would like to.

The first step is building OpenOffice on your system. There is no other
way, unfortunately. Does
https://wiki.openoffice.org/wiki/Documentation/Building_Guide_AOO make
any sense to you? If you are lost, we may be able to help if you
describe your system configuration. Linux is probably the easiest
platform for building.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]