Fred Hatfull

A Blog

Follow me on TwitterRSS Feeds

  • Home
  • About
  • Resume
An example byte illustrating the least significant bit

How to Watermark Audio

Apr 29th

Posted by Fred in Talks

1 comment

The following post contains the notes for a talk I am going to give at Case Western Reserve University for MATH 408 – Introduction to Cryptography. An implementation for two of the first two algorithms can be found on my Github page, and I may implement the third one over the summer if time permits. The talk is based on a research paper written for the class, which can be downloaded here and explains the techniques discussed in this post in further detail.

 

What is Watermarking?

With the rise of the digital age and the internet, it has never been more convenient to store, copy, and transmit large quantities of information. As digital information becomes more prevalent, so does being able to secure the communication of digital information. A number of cryptographic schemes are in use today that rely on computationally infeasible problems to ensure the security of the information transmitted or to verify the authenticity of a document or party. Steganography attempts to hide information, rather than securing it, by embedding the information in an otherwise innocuous host in such a way that distortion to the host is minimal and unable to be detected both perceptually and mathematically unless it is already known to exist. Watermarking is a type of steganography that attempts to mark a signal with information in a robust manner that can’t be removed or mangled if detected.

 

Cryptographic Schemes

  • Transmission of ciphertext occurs in the clear. It is assumed that the encrypted data can be detected and obtained by an adversary.
  • Relies on computationally “hard” problems to maintain the secrecy of the information between the trusted parties.

Watermarking Schemes

  • Transmission is hidden in unrelated material
  • Relies on perceptual and statistical properties to evade detection
  • Tries to ensure the presence of the mark through both typical signal transforms and removal attempts

 

Why would I want to watermark audio?

  • Copyright protection – Watermarks containing unique identifiers could be placed into a copyrighted work as it is distributed. If illegal duplication is suspected, the watermark could be used to confirm the source of the violation.
  • Authenticity verification – A unique watermark could be embedded into a “true” signal to distinguish it from counterfeit or fraudulent signals
  • Provide supplementary information – A signal’s metadata could be embedded into the signal itself and be available even after transforms to the signal, a property which existing metadata schemes currently lack

 

Vocabulary

Steganography – the practice of hiding messages in other information

cover audio – the signal into which hidden information is to be embedded

pseudo-noise (PN) sequence – a sequence of pseudo-randomly chosen 0s and 1s. The signal should exhibit properties similar to a random distribution and is therefore a type of noise.

What Makes a Good Watermark?

Secrecy

A watermark ideally should not be able to be detected. If a watermark introduces too much distortion into the cover signal, it may be noticed by a human, a computer, or both. A watermarked signal should be perceptually and statistically indistinguishable from the cover signal to avoid suspicion.

Robustness

A watermark should be present in a signal even after normal transformations have been applied to the signal. Some common transformations that can threaten a watermark include

  • “Lossy” encoding, where perceptually insignificant information is discarded to reduce the size of the signal
  • Cropping
  • Analog-to-digital and digital-to-analog conversion
  • Addition of carrier noise

Watermarks should also be robust against deliberate removal or corruption attacks, which attempt to make the watermark undetectable and unrecoverable from the signal.

Capacity

A watermarking mechanism should provide capacity to store enough information to allow the watermark to be useful.

 

Least Significant Bit (LSB) Hiding

How is audio represented?

In order to examine the schemes presented here, it is important to understand how audio is represented in digital systems. Sound travels in waves, and digital representations of sound simply store how loud a wave is – its amplitude – as it changes over time. This is accomplished by storing the amplitude as a number, called a sample. Thousands of samples are stored every second; in fact, stereo “CD quality” audio is sampled at a rate of around 44.1 kHz, or 44,100 samples taken every second! The rate at which audio samples are taken is called the sampling rate.

Each sample is just a number, and every number can be represented as a series of bits. This is exactly how the computer sees and stores each sample, and can be manipulated directly to hide information. Each bit in the series of bits corresponds to a power of 2, depending on the place it is in. The bit corresponding to 0th power of 2, or 1, is called the least significant bit, because it’s value has the lowest impact on the overall number represented by the bitstring.

An example byte illustrating the least significant bit

An example byte corresponding to the number 18. The rightmost bit, a 0, is the least significant bit in this representation.

The Algorithm

Least significant bit hiding attempts to simply encode watermark information into the LSB of each sample in the host signal. Each sample’s LSB is changed to reflect the next bit in the watermark until there are no more watermark bits to encode. Variants on this algorithm exist and include performing an exclusive or (XOR) between the LSB and the watermark bit, as well as encoding the watermark repeatedly until the cover signal is exhausted.

LSB hiding is an elementary algorithm, and presents a couple advantages along with some significant weaknesses.

Advantages

  • High channel capacity. Because a bit can be encoded into each sample, the bandwidth available to the watermarking scheme is dictated by the sample rate of the cover signal. As most audio is encoded at 44.1kHz, LSB hiding can allow for up to 44.1kbps, an impressive capacity compared to the other schemes examined here.
  • Perceptually invisible. A human listener will not be able to tell the difference between the original cover and the watermarked signal for a typical signal with a 16 bit sample depth. The difference in amplitude is a maximum of 1 in a range of 2^16 = 65536 values.
  • Fast to encode and fast to decode.

Disadvantages

  • A watermark hidden using LSB hiding will fail to survive even the most basic signal manipulation. Even at high quality, a lossy encoding of the signal will totally change the LSBs of an audio signal, rendering the watermark unrecoverable. Noise introduced, even if below the human perceptible threshold, will also destroy the watermark. Geometric transforms and rescaling of the signal will also damage or destroy the mark. Because of this weakness, LSB watermarks are very easy to attack and destroy and thus provide unacceptable robustness for real-world applications.

Demo

An example implementation of an LSB hiding scheme is provided for reference. The implementation is done using Python 2.7 and can be found on my Github page.

 

Echo Hiding

The Algorithm

Echo hiding exploits the fact that the human auditory system is unable to detect very short echos (sub millisecond) in sound. By alternating sub-perceptible echos, binary information can be encoded into an audio signal.

The first step in echo hiding is to determine the times of the echos that will correspond to 0s and 1s, which we will call d0 and d1, respectively. These times should be chosen between 0.5 milliseconds, where it can be hard to detect the echo computationally, and 1 millisecond, where an acute listener could start perceiving a distortion in the signal.

Next, two derivatives of the cover signal are generated, one with the “zero” echo applied and one with the “one” echo applied. A mixer signal is also generated based on the binary representation of the watermark. The mixer signal dictates when the “zero” echo should be in effect and when the “one” echo should be in effect. This signal is applied to the cover with the zero echo and the cover with the one echo, having the effect of turning the one signal on when a one should be encoded and turning the zero signal on when a zero should be encoded. The two resulting signals are added together to produce the final watermarked signal.

Recovery

Echo detection is accomplished using a computational technique called the autocepstrum. The autocepstrum is a combination of two signal processing techniques – the cepstrum, which attempts to separate the echos in a signal from the original signal, and autocorrelation, which attempts to find repeating patterns in a signal. Details about the autocepstrum are covered in the paper and are described elsewhere; for the sake of brevity, I will omit technical details about the computation here.

Computing the autocorrelation of a watermarked signal in chunks will allow an estimation of how long echos are in that chunk. If an echo is detected at d0, then it is known that a 0 was encoded. Likewise, if an echo is detected at d1, then it is known that a 1 was encoded. The string of 0s and 1s that are produced forms the recovered watermark.

Advantages

  • Robust. Watermarks hidden using echo hiding can survive very low quality encoding schemes, analog to digital and digital to analog conversion, and the addition of noise into the signal. Blind echo detection and cancellation is widely regarded as being a difficult problem, allowing echo hiding a good deal of protection against detection and removal.
  • Good capacity. Research has shown bitrates of as high as 64 bits per second to be possible for certain kinds of cover signals.
  • Original signal and watermark are not needed to recover the watermark.
  • Almost totally imperceptible – if perceived, simply results in a slightly richer signal, but no objectionable distortion.

Disadvantages

  • Can only be used on certain types of cover audio. If the cover is sparse and has significant portions of silence then the watermarking scheme will attempt to encode information into regions of the signal where there is no capacity to do so, resulting in a mangled watermark.
  • Like any encoding scheme that occurs in the time domain, geometric transforms like cropping can render the watermark useless if only clips of the signal are needed.
  • While echo cancellation is a hard problem, there are schemes that can perform echo cancellation quite well in some circumstances.

Demo

An example implementation of an echo hiding scheme is provided in the Github repository for this talk. The implementation is not currently capable of completely decoding the audio deterministically due to a lack of time in preparing the presentation; however, it is provided to show that it is possible to recover an almost identical signal to the original watermark if one knows the echo times of the original encoding.

 

Spread-Spectrum Human Auditory Masking Hiding

This technique, which I lovingly call SSHAM, exploits knowledge about the human auditory system to embed pseudo-noise (PN) sequences in audio. When sounds are made at a certain pitch, sounds made simultaneously at a similar pitch but lower in amplitude are masked by the human perceptual system. Likewise, as seen in echo hiding, sounds made soon after other louder impulses are also masked.

The Algorithm

The signal is first broken down into segments and examined in the frequency domain using an operation called the Discrete Fourier Transform. This lets us examine the volume of each signal segment as it changes with frequency, rather than examining how the signal changes over time. Once the signal is in the frequency domain, we examine it and divide it into its tonal and non-tonal components.

Humans tend to classify sounds into two categories: tonal and non-tonal. Tonal sounds tend to have a narrow frequency band and have a distinct pitch to them; examples include a bird’s song, a musical instrument, or a human voice. Non-tonal sounds tend to have a wide frequency band and exhibit properties perceptually closer to noise. Examples of non-tonal sounds include the crunching of dry leaves, radio static, or the rustling of a plastic bag. Identifying the tonal and non-tonal components to a sound is important, as the human auditory system exhibits different sensitivities to tonal components than it does to non-tonal components.

Once identified, tonal components that lie close enough in frequency to other tonal components such that they are already masked are removed from the signal; likewise, components of the signal that lie below the absolute audible threshold are also removed. This leaves us with a signal which contains only the perceptually significant components in the cover audio. Using the tonal and non-tonal components of the signal, as well as empirical data about the human auditory system’s masking characteristics, a threshold describing the volume below which sounds are imperceptible from the cover signal is generated. The watermark, a PN-sequence, is then filtered with this threshold, resulting in a watermark signal in the frequency domain that will be imperceptible to human listeners.

The watermark is then shifted forward in time slightly to account for temporal masking and then added to the original cover signal.

A final step is taken to improve the robustness of the system against lossy encoding. Because lossy encoding schemes discard perceptually insignificant information in order to compress the signal, the watermark is encoded twice: once in the regions of the signal that are preserved by lossy encoders, and once again in the regions typically discarded by lossy encoders (for increased detection rates).

Recovery

In order to detect the presence of the watermark in a signal, both the original cover signal and the watermark are needed. The original cover signal is subtracted from the received potentially watermarked signal. The result will either be noise resulting from encoding differences of the signals, or noise plus the watermark PN-sequence. Because PN-sequences autocorrelate very well, the result of the subtraction is autocorrelated with the watermark sequence. If the result crosses a reasonable threshold, the watermark is present in the signal.

Advantages

  • Very robust. Can survive extremely lossy compression, geometric transforms, and the addition of more watermarks or other noise.
  • Can be used with any kind of cover audio.
  • Very high recovery rates due to the excellent autocorrelation properties of PN-sequences.
  • Perceptually and statistically undetectable.

Disadvantages

  • The watermark data must be represented as a PN-sequence. This is fine for use with many cryptographic schemes, but makes encoding longer messages or unprotected data infeasible.
  • Recovery requires both the original cover signal as well as the watermark.
  • Vulnerable to collusion attacks, where conspiring individuals average versions of the signal with different watermarks together, resulting in a garbled or corrupted watermark.

 

While I was unable to implement this algorithm in time, the other two algorithms can be found at the Github repository. Additional technical details about the algorithms described here, as well as links to the original authors of the schemes, can be found in the paper, which is available for download here. This certainly is not a comprehensive survey of audio watermarking schemes. If you have a favorite that isn’t discussed here, feel free to post it in the comments!

References

  1. P. Dutta, D. Bhattacharyya, and T. Kim, “Data Hiding in Audio Signal: A Review,” in International Journal of Database Theory and Application, vol. 2, no. 2, June 2009.
  2. D. Gruhl, W. Bender, and A. Lu, “Echo hiding,” in Information Hiding: 1st Int. Workshop (Lecture Notes in Computer Science), vol. 1174, R. J. Anderson, Ed. Berlin, Germany: Springer-Verlag, 1996, pp. 295-315.
  3. L. Boney, A. H. Tewfik, and K. N. Hamdy, “Digital watermarks for audio signals,” in Proc. 1996 IEEE Int. Conf. Multimedia Computing and Systems, Hiroshima, Japan, June 17-23, 1996, pp. 473-480.
  4. F. Petitcolas, R. Anderson, and M. Kuhn, “Attacks on copyright marking systems,” in Information Hiding, ser. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin / Heidelberg, Nov. 1998, vol. 1525, ch. 16, pp. 218–238.
  5. Y. Wu, “Nonlinear Collusion Attack on a Watermarking Scheme for Buyer Authentication,” in IEEE Transactions on Multimedia, vol. 8, no. 3, June 2006, pp. 626-629.
audio, cryptography, information hiding, math408, steganography, watermarking
Inga Borga and the Space Train

All Aboard!

Dec 8th

Posted by Fred in Space Train

No comments

Inga Borga and the Space Train

Inga Borga and the Space Train

Inga Borga is a sweet, but lonely, old woman. These days she spends a lot of time reading poetry, not the least of which is authored by the esteemed poet Stanislav Slavinsky. Mr. Slavinsky is reading his poetry live on the nearby Planet Deux, a mere short hop by Space Train from Inga’s home. Inga wants nothing more to see Stanislav in person, so she embarks on the Mustachio Express to Planet Deux. Little does she know it will be a bumpy ride…

Space Train is now available for Mac and Windows! Download links are here:

Mac (55 MB)

Windows (51 MB)

Space Train is a point-and-click adventure game developed over the course of a semester by students at the Cleveland Institute of Art and Case Western Reserve University. It features a dynamic, event-driven level scripting engine that supports characters, items, inventory, dialogue, and more. Space Train is written in Python using the Pyglet graphics and audio library.

The Space Train team consists of

  • Steve Johnson – Lead programmer, screenplay, music, engine, UI
  • Kelsey Bass – Lead artist, character design/line work/animation
  • Fred Hatfull - Level scripting, engine, UI
  • Liz Keegan – Environments, objects
  • Tyler Goeringer – Level scripting, engine
  • Susie Kim – Objects
  • Drew Carrow – Character design, environment/objects
  • Sean Murphy – Sound, pause screen
  • Dylan Carrow – Coloring, objects
  • Vincent Pizarski – Character design, coloring, exteriors

The project was supervised by Dr. Marc Buchner, Case Western Reserve University and Knut Hybinette, Cleveland Institute of Art.

games, projects, space train
Erlang: The Movie!

Erlang Slides Posted

Nov 18th

Posted by Fred in Talks

No comments


Erlang: The Movie!

True wizards of concurrency.

About a month ago, Case Western alumnus Nick Berendt graciously sent Toby and I to ErlangCamp. We didn’t know a thing about Erlang beyond having heard it mentioned among programming languages that never made it “main-stream.” After 48 hours of intense Erlang instruction, we emerged wiser men. If you don’t know anything about Erlang, here’s a quick rundown:

Erlang is

  • Functional
  • Dynamically typed
  • Highly concurrent
  • Extremely fault-tolerant
  • Hot-swappable
  • Distributed

Toby and I gave an introductory talk on Erlang at CWRU Hacker Society last night. The slides can be found here.

Erlang: the Movie poster via Steve Vinoski.

erlang erlangcamp talks

A Few Notes About Job Interviews

Nov 6th

Posted by Fred in Life

No comments

It’s that time of year again. Being a student means going through a fairly predictable job application process for summer internships every year, and this year has been no exception. This is only my second time through the procedure, but I’ve already begun to learn a thing or two about what is otherwise a fairly stressful and emotional process. What I’m about to explain may seem like common sense to the seasoned applicant, but is worth repeating because it is really the key to both finding the right job for you and landing that job. As I’m studying computer science, most of these tips are geared toward software engineering positions; however, most of the sentiments expressed here are applicable in many other fields, too.

It’s Not About What You Know

This is especially true in computer science and related fields. Yes, you should know the fundamentals; you will never get hired if you don’t know the difference between a compiled and an interpreted language or what a binary tree is. However, what is (or should be) far more important to any potential employer is how you reason through problems as you tackle them. This is a skill that isn’t developed by reading books or blogs or attending lectures. Being able to effectively solve problems is a skill that is cultivated simply by problem solving. Most engineering disciplines, including software engineering, revolve around critical thinking and problem solving; being able to do both well is extremely important and should be sought after by any software shop worth their salt.

When you eventually get asked a question which seems virtually impossible to come up with an answer to, don’t panic. In fact, take this as a prime opportunity to show yourself off. Not only is the recruiter trying to test your limits, but he or she is also trying to gather very important information on what you do when asked to do something you may not necessarily know how to do. The best thing you can do in this situation is to say everything that goes through your head. A recruiter can’t possibly know what you’re thinking if you don’t say it. You will likely explore several possibilities that may or may not result in success and you will probably receive some coaching on the way. Finding the answer is worth bonus points, but what is more important is that the interviewer got a glimpse into your thought process.

Job Interviews Are a Two-Way Street

It’s often easy to feel like the whole interview process is just a test which determines whether or not you are fit to work somewhere in particular. However, it’s important not to forget that while job interviews serve as a way for employers to find out more about an applicant, it’s also a very important means for an applicant to find out more about an employer. There’s a lot to be learned about a company during a job interview. You should be paying as much attention to the recruiter as he or she is paying to you.

Think about what kind of questions you are being asked. Often the inquiries made by a recruiter reflect a lot of information about the employer. As explained earlier, questions that are out of your reach or involve some sort of reasoning are important. If you find yourself consistently stimulated by questions asked in the interview, chances are good you will be consistently stimulated on the job, too. Likewise, if all of the questions in the interview are about keywords in a specific language (I’ve had this happen at least once!), there’s a reasonable chance you could find yourself performing similarly mindless work for that employer.

It’s important for you to be asking questions in the interview as well. When a recruiter asks “Do you have any questions?“, the answer is never “no.” This is good practice for two reasons. Firstly, it shows the employer that you have some interest in the company and that you care about where you end up working. But secondly, and perhaps more importantly, it gives you a chance to actually figure out what the company is about, and whether or not is actually somewhere you will want to be developing software. In a good interview you will probably have a long list of questions about the company, the technologies it uses, how it’s structured, and more. However, if you are unsure about the company, this is a great opportunity to find out whether or not it’s somewhere you want to end up. A good place to start is Joel Spolsky’s 12 Steps to Better Code. While that list is over 10 years old now, it provides some good starting points for probing how a software shop conducts itself.

Remember That You’re Worth Hiring

Not every job is for everyone, and not everyone is for every job. The job application process can seem tedious, one-sided, and disheartening, especially in a climate where jobs are scarce. Every person has a unique set of talents and skills they bring to the table, and some positions are better fits than others for your particular set. It may seem trite, but you really do have to just keep trying. Eventually you will find a position that’s a good fit for both you and the employer. Until then, try to stay confident if you don’t get a callback from that killer job you were hoping for. Confidence is a huge asset in the job search process, and maintaining it is invaluable.

interviews, jobs, programming

On the Merits of Blogging

Oct 30th

Posted by Fred in Uncategorized

1 comment

I used to think blogging was lame.

Now I have lots of stimulating things going on and too much stuff to keep in my brain. I want to put some of it here.

Maybe it will help someone some day.

This is my blog.

blogging, blogs, first!
  • From the Tweeterbox

    Loading tweets...
    Follow me on Twitter!
  • What I'm Listening To

    • El Ten Eleven – Bye Mom
    • El Ten Eleven – Connie
    • El Ten Eleven – Fanshawe
    • El Ten Eleven – Thinking Loudly
    • El Ten Eleven – Central Nervous Piston
  • Photos

  • Blogroll

    • Kristen Herdman
    • Tim|Steve
    • Toby Waite
  • Tags

    audio blogging blogs cryptography erlang erlangcamp talks first! games information hiding interviews jobs math408 programming projects space train steganography watermarking
Mystique theme by digitalnature | Powered by WordPress
RSS Feeds XHTML 1.1 Top