Home General Discussion

Converting a Word doc to Html... any good editors?

I'm about to convert a massive >200 page user manual into HTML and CSS. It's in MSWord format, which has been a pain. Now I want to convert it over, make it easier to use and maintain.

MS makes a mess in its pants when it tries to write HTML. All the images are gone, the tables are in shambles, there's pee all over the floor...

I'm just starting to learn CSS, doesn't look too hard. I'm also comfortable editing HTML, so I'm wondering if anyone has an editor they prefer? I used HomeSite a few years ago, seemed pretty good, if a bit top-heavy.

Replies

  • pior
    Offline / Send Message
    pior grand marshal polycounter
    Been using Araneae since years and love it!

    http://www.ornj.net/software/araneae

    It's a very simple text editor, tabs-based, with syntax highlighting and inbrowser preview. Does exactly what it should, nothing more, nothing less wink.gif
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    Vi(m). Accept no substitutes.
  • Eric Chadwick
    Thanks pior, looks good!
  • Downsizer
    Offline / Send Message
    Downsizer polycounter lvl 18
    Word 2003 can save to html natively. If the images dont show up because of path errors, use the 'Replace All' function in word pad to alter all paths at the same time. Should'nt take much time at all.

    I'm a Sr. Systems Engineer if any of you ever need something. Just yell.
  • cholden
    Offline / Send Message
    cholden polycounter lvl 18
    Open Office also allows save to HTML in it's doc editor.
  • hawken
    Offline / Send Message
    hawken polycounter lvl 19
    your best bet is to copy out of word into something like dreamweaver, because the html that comes out of word stinks. The html that comes out of dreamweaver is bad but at least it works in a browser.

    Personally I use a program called textpad but in recent years I have started using editplus a bit more.

    KDR I don't know why you mention Vim in this thread... we are talking about windows! any opportunity to pimp linux :P
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    gVim runs under Windows as well, it's what I use.

    HTML Tidy can fix Word HTML files to a certain degree.
  • Eric Chadwick
    Good stuff.

    I've been learning CSS, and I'm liking it. Problem is, the W3C HTML validator pops up some niggling errors, like I can't use <b> for bold, all my <img> tags have to be closed, etc. Do you guys make your code work with XHTML Strict, or do you just get close enough, and not worry about the small errors?

    Also, I was wondering what's a good method of adding a common header & footer to all HTML pages? I don't want to put it explicitly in every file (for ease of editing), nor use frames. I can't figure out how to embed text within a CSS file. Maybe some way to load another HTML within a <div>?

    Thanks for your help.
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    I use HTML 4, not XHTML.

    I think a tag without closing tag can be denoted with <.../>, that way you don't have to close 'em.

    Best way to put a common header and footer on everything is a script. Batch is powerful enough (use piping!) for that.
  • MoP
    Offline / Send Message
    MoP polycounter lvl 18
    Eric, I'm currently re-making my site, and I'm making it stick to XHTML practises - hence closing image tags, like KDR says, just put a / before the > of the img section. So far everything seems to be working perfectly on all the browsers I've tested on, so it looks like a good way to go.
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    As for web standards:

    Unless you're going to become a webdesigner, or your code causes some serious flaws with web-browsers don't worry about little things like using pt or px for fonts, or <b> or <strong> for style coding and stuff like that. When people are viewing your website for a non-webdesign position I highly doubt they're going to right click and view source then start browsing your code for errors. Otherwise the CSS stuff and closing tags streamlines your work.

    Another thing that streamlines your work is Dreamweaver as it can parse your color and color code it, along with close tags for you, or delete extraneous tags.

    @Chadwick: Do you not want to use frames, or not want to use both frames and iframes (iframes are different than frames in that they can exist within a <div> and call up another html document, plus be controlled by <a href> links within the called page and the original main page).

  • Eric Chadwick
    Well the pages have to be easily printable, so I'm avoiding frames mostly for this, but also I've heard rumblings about other problems/issues with them. Iframes sound interesting, will look into this.

    I was trying the validation route, thinking this will make the pages readable on more browsers, past and future. The doc has to be accessible-friendly. It's part of the specification for our system architecture, which will have other users in the near future.

    I'm including another CSS for printing, hopefully that'll help make it print well, though I'm sticking to a fairly vanilla design so maybe that's not necessary. The mandate is to make a document that will last a long time, so the simpler and more standardized the better.

    What is piping? I've used a program in the past called BatchReplaceEM which let me batch find/replace across multiple html files. Worked OK. But CSS seems so much more elegant.

    Oh, I'm also looking for an HTML editor that will let me assign custom code wrappings to hotkeys. E.g. I select a few lines of text, press Ctrl-L, and it spits out:
    <font class="small">Code:</font><hr /><pre><ol class=decimal>
    <il>first line of text
    <il>second line of text
    <il>etc.
    </ol></pre><hr />
    Or even more commonly, specific replacements for <i> like Ctrl-B = <ital class="foo">, etc., so I can quickly tie selected text blocks to classes in my CSS.

    Araneae is great, I like the simplicity of it. Just need this custom-hotkeying as well.

    Thanks again for all the help. Coding is cool, isn't it? I'm actually digging it.
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    It won't quite print that easily with iframes. It will print, but its going to be all chopped up, and not-aligned as how it appears on screen.

    As for validation, I'd download a Text Only Browser and see how it displays in there for printing purposes, either that or make a seperate file for printing. Like a 'Printable Version' button that pops up a printable version.

    What is this for anyway, that it has to maintain cross-browser compatibility for an extended period of time?

    Edit: you could also use Dreamweaver...which has a Find and Replace feature that can access every page in a site and then replace that found object with another set of text.

    Like lets say you marked up everything you wanted to have a specific style with <^~> and </^~> tags. Since this is worthless code, and wouldn't show up in any real code, you could use find and replace to replace every ^~ with style="blah". This would essentially replace all the open and close tags with the correct code... ^.^
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    Quick test:

    Before:
    <font class="small">Code:</font><hr /><pre><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd"&gt;
    <html>
    <head>
    <title>Untitled Document</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    </head>
    <~^>Testing my theory</~^>
    <~^>of the lovely idea</~^>
    <~^>of Dreamweaver find</~^>
    <~^>and replace.<~^>
    <body>
    </body>
    </html></pre><hr />

    After:

    <font class="small">Code:</font><hr /><pre>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd"&gt;
    <html>
    <head>
    <title>Untitled Document</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    </head>
    <style="blah">Testing my theory</style="blah">
    <style="blah">of the lovely idea</style="blah">
    <style="blah">of Dreamweaver find</style="blah">
    <style="blah">and replace.<style="blah">
    <body>
    </body>
    </html>
    </pre><hr />

    Works, so long as you choose "Source Code" and then 'Replace All'. Hmm, you may have to set up seperate tags for certain things now that I look at it. One for open and then for close. >_<
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    Is <style="blah"> even a valid tag?

    My website uses a batch script to assemble the headers, footer and page content and it validates before I upload it, after that the host completely butchers the HTML with their ad code that causes roughly fifty validation errors.

    I've just written a new batchfile:
    @echo off
    echo Generating...

    ren ..\output\*.htm *.src
    for %%F in (*.src) do (
    echo %%F
    type head.txt >> ..\output\%%F
    type %%F >> ..\output\%%F
    type foot.txt >> ..\output\%%F
    )
    ren ..\output\*.src *.htm

    The idea is that you put the part that changes from page to page into the pagename.src files in a sourcefiles directory (e.g. homepage\source) and it'll generate the files in the ..\output directory (when placed in homepage\source that ends in homepage\output). The header and footer go into head.txt and foot.txt.

    Maybe that sounds a bit complicated but all you have to do to make a new page is to write the relevant bits into the .src file and run the batch.
  • -Onyx-
    Offline / Send Message
    -Onyx- polycounter lvl 17
    Pay for a webhost, and they won't butcher your code with their ads. I use 1and1 for my webhost. They have very good prices, and offer a lot. I have never had any issues with them...they are completely reliable.
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    I don't need a quality website, just something to host files and the occassional pimp. Eric probably has a website already (or it's meant for offline use) since that's for business purposes.
  • -Onyx-
    Offline / Send Message
    -Onyx- polycounter lvl 17
    1and1 has webhosting (with photo album) starting at $2.99/Month. They also offer a Free 3 page website if you register a Domain through them ($5.99 Annual Fee). Getting a Domain, and website is cheap!
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    Yeah, well, Lycos is free and does what I need.
  • -Onyx-
    Offline / Send Message
    -Onyx- polycounter lvl 17
    Well you were stating that it breaks your code with their ads so I showed you an alternative.
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    The site displays normally, the ads just fail the W3C validator.
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    [ QUOTE ]
    Is <style="blah"> even a valid tag?


    [/ QUOTE ]

    Actually it should be <p class="blah"> now that I look at it with proper CSS styling defining what blah is.
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    I always use div, not p...
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    [ QUOTE ]
    I always use div, not p...

    [/ QUOTE ]

    Thats umm...quite pointless. Any particular reason? Because even professionals still use <p> paragraphs and link styles to them, in conjuction with the use of <h> headers.
  • Asthane
    Offline / Send Message
    Asthane polycounter lvl 18
    [ QUOTE ]
    [ QUOTE ]
    I always use div, not p...

    [/ QUOTE ]

    Thats umm...quite pointless. Any particular reason? Because even professionals still use <p> paragraphs and link styles to them, in conjuction with the use of <h> headers.

    [/ QUOTE ]

    They both come with unwanted attributes. I really can't understand why theres no null tag to use with CSS. I'd almost consider using something like <u> and neutralizing its effect in the CSS, if that didn't ruin the look of the unstyled content.

    CSS in general seems rather badly designed, in my opinion though. Now, don't get me wrong, it's still a great timesaver, but it really pisses me off when there's a property I want to set, and no corresponding property in CSS (vertical align [I think?] comes to mind)
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    [ QUOTE ]
    They both come with unwanted attributes. I really can't understand why theres no null tag to use with CSS. I'd almost consider using something like <u> and neutralizing its effect in the CSS, if that didn't ruin the look of the unstyled content.

    CSS in general seems rather badly designed, in my opinion though. Now, don't get me wrong, it's still a great timesaver, but it really pisses me off when there's a property I want to set, and no corresponding property in CSS (vertical align [I think?] comes to mind)

    [/ QUOTE ]

    Like what unwanted attributes, missing elements...and what would you do with a null tag? Highly confused >_< since I'm new to using CSS anyway...
  • Eric Chadwick
    Well, the document is part of the technical specification for our system architecture, which has been carefully designed to last a long time. It's pretty amazing really, some info on our site, also quite a bit of info in this CGTalk thread.

    Anyhow the doc is meant to be used by anyone who uses the system to create/edit/deliver content. Which could include those who need accessibility features to "read" it (large fonts, braille, text to speech, etc.). It'll probably evolve into another format by then, but I might as well start it off on the right foot. Also it will likely start out with client-only access, but will eventually get into the public realm.


    If I understand this correctly, I think you'd want a null tag so that browsers that don't support CSS would just ignore the tag, displaying the contents as plain text. But if CSS is supported, the .css file would parse that null tag and add a style to the text inside it.

    So then I guess a good idea for me might be to use vanilla tags, like <p class="harry">, then use the CSS to replace those with new styling for a more refined look in browsers that support CSS. Sounds like that just might work.

    So now I'm looking for a fairly simple text editor, ala Araneae, except with customizable hotkeyable styles. That would rock my world. Well at least until I take off for the break.
  • Asthane
    Offline / Send Message
    Asthane polycounter lvl 18
    P has margins I always have to change when I use it, and I've had lots of funky bahavior with them in different browsers when I change them. I don't recall if div has any, but neither p nor div can be used 'in line'. I'm no pro or anything either though, usually my CSS projects are reitterations of my own webpages or such.

    As for the types of things I'd use a null tag for: as Eric said, it displays poorly, but more than that-- usually collections of font styles I don't want to apply to the entire paragraph or margins I want to apply to more than one paragraph and divs won't work (CSS loses some of its functionality when you're doing class="leftmarginsmall box header strong" :P). It's not common, but it usually comes up in something. If I had access to my last project I'd give you a specific example, but it was lost in a server mishap =/

    Actually, come to think of it, <font> might be a good null tag...

    [edit] Oh, and as for your document, your description screams out 'PDF' to me, since they're designed to look and work identically on most machines, are generally uneditable by ene users, and have a number of features that are useful for reference material like bookmarks and such. Just make sure you make good PDFs if you go that route. So many people seem to hate the format because of slow load/display times, unselectable text and such, virtually all of which generally comes down to the person who made the file not doing it well.
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    confused.gif

    I'm still highly confused. I'd probably need an example, because I'm not quite seeing where CSS would perform that horribly. Or where a standard <div> <p> <h> or <span> wouldn't suffice. If theres any situation you can think of I guess list it and I'll see if I can code it using CSS and HTML.
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    <div> does nothing except cause a linebreak, <span> does absolutely nothing, that's the null-tag you're looking for.
  • -Onyx-
    Offline / Send Message
    -Onyx- polycounter lvl 17
    The <span> tag is used to group inline-elements in a document.

    Use the <span> tag to group inline-elements to format them with styles
  • Illusions
    Offline / Send Message
    Illusions polycounter lvl 18
    Arghhh, <div> and <span> are not null tags, they have functions that get assigned through CSS...

    adding code that does nothing to your website is adding extra polygons to your model that don't define anything. The only exception to this is the use of comment tags...
  • Asthane
    Offline / Send Message
    Asthane polycounter lvl 18
    [ QUOTE ]
    <div> does nothing except cause a linebreak, <span> does absolutely nothing, that's the null-tag you're looking for.

    [/ QUOTE ]

    Thank you, sorry that ended up derailing this thread so so much o_0
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    Illusions: I suppose the null referred to the function without CSS. Would blanko tag sound better to you?
  • Eric Chadwick
    I was creating a PDF from it, but there's a big problem with PDF output from Word... the cross-linking breaks easily, or doesn't export at all, I have to turn on bookmarking for every heading style but then it stores a load of extraneous bookmarks, it takes forever to save the PDF, etc. It's a real mess.

    This doc has a ton of hyperlinking, it's a very useful feature. That's actually the primary impetus to move this to HTML.

    Thanks again everyone for the input, much appreciated.
  • Eric Chadwick
    Found a good freeware editor, http://www.html-kit.com. Love the custom-hotkeyable functions.

    Dreamweaver was auto-completing incorrectly, adding semicolons to the ends of special characters (the TM one) everytime I saved the file, which then broke the W3C HTML4.1 validation.

    It's been fun learning about the ins and outs of good HTML/CSS code. I can see how coders can be creative types.
  • KDR_11k
    Offline / Send Message
    KDR_11k polycounter lvl 18
    Strange, the W3C validation never complained about semicolons on my special characters. OTOH, Opera complains when there AREN'T semicolons on the end.
Sign In or Register to comment.